2003-02-24 15:13:07 -05:00
|
|
|
|
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
|
|
|
|
|
* Copyright by the Board of Trustees of the University of Illinois. *
|
|
|
|
|
* All rights reserved. *
|
|
|
|
|
* *
|
|
|
|
|
* This file is part of HDF5. The full HDF5 copyright notice, including *
|
|
|
|
|
* terms governing use, modification, and redistribution, is contained in *
|
|
|
|
|
* the files COPYING and Copyright.html. COPYING can be found at the root *
|
|
|
|
|
* of the source code distribution tree; Copyright.html can be found at the *
|
|
|
|
|
* root level of an installed copy of the electronic HDF5 document set and *
|
|
|
|
|
* is linked from the top-level documents page. It can also be found at *
|
|
|
|
|
* http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html. If you do not have *
|
|
|
|
|
* access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. *
|
|
|
|
|
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
|
|
|
|
|
|
|
|
|
/* Programmer: Quincey Koziol <koziol@ncsa.uiuc.ued>
|
1998-07-06 16:01:13 -05:00
|
|
|
|
* Friday, May 29, 1998
|
|
|
|
|
*
|
2003-02-24 15:13:07 -05:00
|
|
|
|
* Purpose: Dataspace selection functions.
|
1998-07-06 16:01:13 -05:00
|
|
|
|
*/
|
|
|
|
|
|
2000-10-10 02:43:38 -05:00
|
|
|
|
#define H5S_PACKAGE /*suppress error about including H5Spkg */
|
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
#include "H5private.h" /* Generic Functions */
|
|
|
|
|
#include "H5Dprivate.h" /* Datasets (for their properties) */
|
|
|
|
|
#include "H5Eprivate.h" /* Error handling */
|
|
|
|
|
#include "H5FLprivate.h" /* Free Lists */
|
|
|
|
|
#include "H5Iprivate.h" /* ID Functions */
|
|
|
|
|
#include "H5Spkg.h" /* Dataspace functions */
|
|
|
|
|
#include "H5Vprivate.h" /* Vector functions */
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
|
|
|
|
/* Interface initialization */
|
1999-04-14 16:48:05 -05:00
|
|
|
|
#define PABLO_MASK H5Sselect_mask
|
1998-11-20 22:36:51 -05:00
|
|
|
|
#define INTERFACE_INIT NULL
|
2001-08-14 17:09:56 -05:00
|
|
|
|
static int interface_initialize_g = 0;
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
2000-04-04 16:00:31 -05:00
|
|
|
|
/* Declare external the free list for hssize_t arrays */
|
|
|
|
|
H5FL_ARR_EXTERN(hssize_t);
|
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Declare a free list to manage arrays of size_t */
|
|
|
|
|
H5FL_ARR_DEFINE_STATIC(size_t,-1);
|
|
|
|
|
|
|
|
|
|
/* Declare a free list to manage arrays of hsize_t */
|
|
|
|
|
H5FL_ARR_DEFINE_STATIC(hsize_t,-1);
|
|
|
|
|
|
|
|
|
|
/* Declare a free list to manage blocks of single datatype element data */
|
|
|
|
|
H5FL_BLK_EXTERN(type_elem);
|
|
|
|
|
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5S_get_vector_size
|
|
|
|
|
PURPOSE
|
|
|
|
|
Gets the size of the I/O vector
|
|
|
|
|
USAGE
|
|
|
|
|
ssize_t H5S_get_vector_size(dxpl_id)
|
|
|
|
|
hid_t dxpl_id; IN: The dataset transfer property list to query
|
|
|
|
|
RETURNS
|
|
|
|
|
Non-negative number of entries in I/O vector on success, negative on failure
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Retrieves the number of I/O vector entries to use for a given dataset
|
|
|
|
|
transfer. If the default dataset property list is used, the default
|
|
|
|
|
number of I/O vectors is returned.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
static ssize_t
|
|
|
|
|
H5S_get_vector_size(hid_t dxpl_id)
|
|
|
|
|
{
|
|
|
|
|
ssize_t ret_value; /* return value */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOINIT(H5S_get_vector_size);
|
|
|
|
|
|
|
|
|
|
if(dxpl_id==H5P_DATASET_XFER_DEFAULT) {
|
|
|
|
|
ret_value=H5D_XFER_HYPER_VECTOR_SIZE_DEF;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
H5P_genplist_t *dx_plist; /* Dataset transfer property list */
|
|
|
|
|
|
|
|
|
|
/* Get the hyperslab vector size */
|
|
|
|
|
if(NULL == (dx_plist = H5I_object(dxpl_id)))
|
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not a dataset transfer property list");
|
|
|
|
|
if (H5P_get(dx_plist,H5D_XFER_HYPER_VECTOR_SIZE_NAME,&ret_value)<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get value");
|
|
|
|
|
} /* end else */
|
|
|
|
|
|
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
} /* H5S_get_vector_size() */
|
|
|
|
|
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5S_select_copy
|
|
|
|
|
PURPOSE
|
|
|
|
|
Copy a selection from one dataspace to another
|
|
|
|
|
USAGE
|
|
|
|
|
herr_t H5S_select_copy(dst, src)
|
|
|
|
|
H5S_t *dst; OUT: Pointer to the destination dataspace
|
|
|
|
|
H5S_t *src; IN: Pointer to the source dataspace
|
|
|
|
|
RETURNS
|
1998-10-26 16:18:54 -05:00
|
|
|
|
Non-negative on success/Negative on failure
|
1998-07-06 16:01:13 -05:00
|
|
|
|
DESCRIPTION
|
|
|
|
|
Copies all the selection information (include offset) from the source
|
|
|
|
|
dataspace to the destination dataspace.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
1998-07-08 10:05:01 -05:00
|
|
|
|
herr_t
|
|
|
|
|
H5S_select_copy (H5S_t *dst, const H5S_t *src)
|
1998-07-06 16:01:13 -05:00
|
|
|
|
{
|
1998-07-22 17:11:22 -05:00
|
|
|
|
herr_t ret_value=SUCCEED; /* return value */
|
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_copy, FAIL);
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert(dst);
|
|
|
|
|
assert(src);
|
|
|
|
|
|
1998-07-22 17:11:22 -05:00
|
|
|
|
/* Copy regular fields */
|
|
|
|
|
dst->select=src->select;
|
|
|
|
|
|
1998-08-03 19:30:35 -05:00
|
|
|
|
/* Need to copy order information still */
|
|
|
|
|
|
|
|
|
|
/* Copy offset information */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (NULL==(dst->select.offset = H5FL_ARR_CALLOC(hssize_t,src->extent.u.simple.rank)))
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR (H5E_RESOURCE, H5E_NOSPACE, FAIL, "memory allocation failed");
|
1998-08-03 19:30:35 -05:00
|
|
|
|
if(src->select.offset!=NULL)
|
|
|
|
|
HDmemcpy(dst->select.offset,src->select.offset,(src->extent.u.simple.rank*sizeof(hssize_t)));
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
|
|
|
|
/* Perform correct type of copy based on the type of selection */
|
1998-07-22 17:11:22 -05:00
|
|
|
|
switch (src->extent.type) {
|
|
|
|
|
case H5S_SCALAR:
|
|
|
|
|
/*nothing needed */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SIMPLE:
|
|
|
|
|
/* Deep copy extra stuff */
|
|
|
|
|
switch(src->select.type) {
|
|
|
|
|
case H5S_SEL_NONE:
|
|
|
|
|
case H5S_SEL_ALL:
|
|
|
|
|
/*nothing needed */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_POINTS:
|
|
|
|
|
ret_value=H5S_point_copy(dst,src);
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_HYPERSLABS:
|
|
|
|
|
ret_value=H5S_hyper_copy(dst,src);
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
default:
|
|
|
|
|
assert("unknown selection type" && 0);
|
|
|
|
|
break;
|
|
|
|
|
} /* end switch */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_COMPLEX:
|
|
|
|
|
/*void */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
default:
|
|
|
|
|
assert("unknown data space type" && 0);
|
|
|
|
|
break;
|
2002-07-31 10:27:07 -05:00
|
|
|
|
} /* end switch */
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
2002-08-08 11:52:55 -05:00
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
1998-07-06 16:01:13 -05:00
|
|
|
|
} /* H5S_select_copy() */
|
1998-07-23 18:29:44 -05:00
|
|
|
|
|
1998-07-24 15:46:19 -05:00
|
|
|
|
|
1998-07-06 16:01:13 -05:00
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
1998-08-31 22:35:23 -05:00
|
|
|
|
H5Sget_select_npoints
|
1998-07-06 16:01:13 -05:00
|
|
|
|
PURPOSE
|
|
|
|
|
Get the number of elements in current selection
|
|
|
|
|
USAGE
|
1998-10-26 12:42:48 -05:00
|
|
|
|
hssize_t H5Sget_select_npoints(dsid)
|
1998-07-06 16:01:13 -05:00
|
|
|
|
hid_t dsid; IN: Dataspace ID of selection to query
|
|
|
|
|
RETURNS
|
|
|
|
|
The number of elements in selection on success, 0 on failure
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Returns the number of elements in current selection for dataspace.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
1998-10-26 12:42:48 -05:00
|
|
|
|
hssize_t
|
1998-08-31 22:35:23 -05:00
|
|
|
|
H5Sget_select_npoints(hid_t spaceid)
|
1998-07-06 16:01:13 -05:00
|
|
|
|
{
|
|
|
|
|
H5S_t *space = NULL; /* Dataspace to modify selection of */
|
2002-08-08 11:52:55 -05:00
|
|
|
|
hssize_t ret_value; /* return value */
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_API(H5Sget_select_npoints, 0);
|
1998-10-26 14:55:54 -05:00
|
|
|
|
H5TRACE1("Hs","i",spaceid);
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
2002-07-31 14:17:12 -05:00
|
|
|
|
if (NULL == (space=H5I_object_verify(spaceid, H5I_DATASPACE)))
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, 0, "not a data space");
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value = (*space->select.get_npoints)(space);
|
1998-07-06 16:01:13 -05:00
|
|
|
|
|
2002-08-08 11:52:55 -05:00
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_API(ret_value);
|
1998-08-31 22:35:23 -05:00
|
|
|
|
} /* H5Sget_select_npoints() */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
1998-08-03 19:30:35 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5Sselect_valid
|
|
|
|
|
PURPOSE
|
|
|
|
|
Check whether the selection fits within the extent, with the current
|
|
|
|
|
offset defined.
|
|
|
|
|
USAGE
|
1998-10-26 17:44:13 -05:00
|
|
|
|
htri_t H5Sselect_void(dsid)
|
1998-08-03 19:30:35 -05:00
|
|
|
|
hid_t dsid; IN: Dataspace ID to query
|
|
|
|
|
RETURNS
|
|
|
|
|
TRUE if the selection fits within the extent, FALSE if it does not and
|
1998-10-26 16:18:54 -05:00
|
|
|
|
Negative on an error.
|
1998-08-03 19:30:35 -05:00
|
|
|
|
DESCRIPTION
|
|
|
|
|
Determines if the current selection at the current offet fits within the
|
|
|
|
|
extent for the dataspace.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
1998-10-26 17:44:13 -05:00
|
|
|
|
htri_t
|
1998-08-27 11:48:50 -05:00
|
|
|
|
H5Sselect_valid(hid_t spaceid)
|
1998-08-03 19:30:35 -05:00
|
|
|
|
{
|
|
|
|
|
H5S_t *space = NULL; /* Dataspace to modify selection of */
|
2002-08-08 11:52:55 -05:00
|
|
|
|
htri_t ret_value; /* return value */
|
1998-08-03 19:30:35 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_API(H5Sselect_valid, 0);
|
1998-08-03 19:30:35 -05:00
|
|
|
|
H5TRACE1("b","i",spaceid);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
2002-07-31 14:17:12 -05:00
|
|
|
|
if (NULL == (space=H5I_object_verify(spaceid, H5I_DATASPACE)))
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, 0, "not a data space");
|
1998-08-03 19:30:35 -05:00
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value = (*space->select.is_valid)(space);
|
1998-08-03 19:30:35 -05:00
|
|
|
|
|
2002-08-08 11:52:55 -05:00
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_API(ret_value);
|
1998-08-03 19:30:35 -05:00
|
|
|
|
} /* H5Sselect_valid() */
|
2001-11-02 15:31:35 -05:00
|
|
|
|
|
1998-11-24 19:29:09 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5S_select_deserialize
|
|
|
|
|
PURPOSE
|
|
|
|
|
Deserialize the current selection from a user-provided buffer into a real
|
2002-07-31 10:27:07 -05:00
|
|
|
|
selection in the dataspace.
|
1999-03-10 18:50:03 -05:00
|
|
|
|
USAGE
|
2002-07-31 10:27:07 -05:00
|
|
|
|
herr_t H5S_select_deserialize(space, buf)
|
|
|
|
|
H5S_t *space; IN/OUT: Dataspace pointer to place selection into
|
|
|
|
|
uint8 *buf; IN: Buffer to retrieve serialized selection from
|
1999-03-10 18:50:03 -05:00
|
|
|
|
RETURNS
|
2002-07-31 10:27:07 -05:00
|
|
|
|
Non-negative on success/Negative on failure
|
1999-03-10 18:50:03 -05:00
|
|
|
|
DESCRIPTION
|
2002-07-31 10:27:07 -05:00
|
|
|
|
Deserializes the current selection into a buffer. (Primarily for retrieving
|
|
|
|
|
from disk). This routine just hands off to the appropriate routine for each
|
|
|
|
|
type of selection. The format of the serialized information is shown in
|
|
|
|
|
the H5S_select_serialize() header.
|
1999-03-10 18:50:03 -05:00
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
2002-07-31 10:27:07 -05:00
|
|
|
|
herr_t
|
|
|
|
|
H5S_select_deserialize (H5S_t *space, const uint8_t *buf)
|
1999-03-10 18:50:03 -05:00
|
|
|
|
{
|
2002-07-31 10:27:07 -05:00
|
|
|
|
const uint8_t *tbuf; /* Temporary pointer to the selection type */
|
|
|
|
|
uint32_t sel_type; /* Pointer to the selection type */
|
|
|
|
|
herr_t ret_value=FAIL; /* return value */
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_deserialize, FAIL);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
|
|
|
|
assert(space);
|
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
tbuf=buf;
|
|
|
|
|
UINT32DECODE(tbuf, sel_type);
|
|
|
|
|
switch(sel_type) {
|
1999-03-10 18:50:03 -05:00
|
|
|
|
case H5S_SEL_POINTS: /* Sequence of points selected */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value=H5S_point_deserialize(space,buf);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_HYPERSLABS: /* Hyperslab selection defined */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value=H5S_hyper_deserialize(space,buf);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_ALL: /* Entire extent selected */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value=H5S_all_deserialize(space,buf);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_NONE: /* Nothing selected */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value=H5S_none_deserialize(space,buf);
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
default:
|
1999-03-10 18:50:03 -05:00
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
|
2002-08-09 15:48:23 -05:00
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
2002-07-31 10:27:07 -05:00
|
|
|
|
} /* H5S_select_deserialize() */
|
2001-11-02 15:31:35 -05:00
|
|
|
|
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5Sget_select_bounds
|
|
|
|
|
PURPOSE
|
|
|
|
|
Gets the bounding box containing the selection.
|
|
|
|
|
USAGE
|
|
|
|
|
herr_t H5S_get_select_bounds(space, start, end)
|
|
|
|
|
hid_t dsid; IN: Dataspace ID of selection to query
|
|
|
|
|
hsize_t *start; OUT: Starting coordinate of bounding box
|
|
|
|
|
hsize_t *end; OUT: Opposite coordinate of bounding box
|
|
|
|
|
RETURNS
|
|
|
|
|
Non-negative on success, negative on failure
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Retrieves the bounding box containing the current selection and places
|
|
|
|
|
it into the user's buffers. The start and end buffers must be large
|
|
|
|
|
enough to hold the dataspace rank number of coordinates. The bounding box
|
|
|
|
|
exactly contains the selection, ie. if a 2-D element selection is currently
|
|
|
|
|
defined with the following points: (4,5), (6,8) (10,7), the bounding box
|
|
|
|
|
with be (4, 5), (10, 8). Calling this function on a "none" selection
|
|
|
|
|
returns fail.
|
|
|
|
|
The bounding box calculations _does_ include the current offset of the
|
|
|
|
|
selection within the dataspace extent.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
herr_t
|
|
|
|
|
H5Sget_select_bounds(hid_t spaceid, hsize_t *start, hsize_t *end)
|
|
|
|
|
{
|
|
|
|
|
H5S_t *space = NULL; /* Dataspace to modify selection of */
|
2002-08-08 11:52:55 -05:00
|
|
|
|
herr_t ret_value; /* return value */
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_API(H5Sget_select_bounds, FAIL);
|
1999-03-12 13:35:04 -05:00
|
|
|
|
H5TRACE3("e","i*h*h",spaceid,start,end);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
if(start==NULL || end==NULL)
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADVALUE, FAIL, "invalid pointer");
|
2002-07-31 14:17:12 -05:00
|
|
|
|
if (NULL == (space=H5I_object_verify(spaceid, H5I_DATASPACE)))
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not a data space");
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
ret_value = (*space->select.bounds)(space,start,end);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
|
2002-08-08 11:52:55 -05:00
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_API(ret_value);
|
1999-03-10 18:50:03 -05:00
|
|
|
|
} /* H5Sget_select_bounds() */
|
2001-11-02 15:31:35 -05:00
|
|
|
|
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
2002-04-09 07:47:34 -05:00
|
|
|
|
H5S_select_iterate
|
1999-06-23 21:16:13 -05:00
|
|
|
|
PURPOSE
|
|
|
|
|
Iterate over the selected elements in a memory buffer.
|
|
|
|
|
USAGE
|
|
|
|
|
herr_t H5S_select_iterate(buf, type_id, space, operator, operator_data)
|
|
|
|
|
void *buf; IN/OUT: Buffer containing elements to iterate over
|
|
|
|
|
hid_t type_id; IN: Datatype ID of BUF array.
|
|
|
|
|
H5S_t *space; IN: Dataspace object containing selection to iterate over
|
1999-08-10 13:54:06 -05:00
|
|
|
|
H5D_operator_t op; IN: Function pointer to the routine to be
|
1999-06-23 21:16:13 -05:00
|
|
|
|
called for each element in BUF iterated over.
|
|
|
|
|
void *operator_data; IN/OUT: Pointer to any user-defined data
|
|
|
|
|
associated with the operation.
|
|
|
|
|
RETURNS
|
|
|
|
|
Returns the return value of the last operator if it was non-zero, or zero
|
|
|
|
|
if all elements were processed. Otherwise returns a negative value.
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Iterates over the selected elements in a memory buffer, calling the user's
|
|
|
|
|
callback function for each element. The selection in the dataspace is
|
|
|
|
|
modified so that any elements already iterated over are removed from the
|
|
|
|
|
selection if the iteration is interrupted (by the H5D_operator_t function
|
|
|
|
|
returning non-zero) in the "middle" of the iteration and may be re-started
|
|
|
|
|
by the user where it left off.
|
|
|
|
|
|
|
|
|
|
NOTE: Until "subtracting" elements from a selection is implemented,
|
|
|
|
|
the selection is not modified.
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
herr_t
|
1999-08-10 13:54:06 -05:00
|
|
|
|
H5S_select_iterate(void *buf, hid_t type_id, H5S_t *space, H5D_operator_t op,
|
1999-06-23 21:16:13 -05:00
|
|
|
|
void *operator_data)
|
|
|
|
|
{
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
H5T_t *dt; /* Datatype structure */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5S_sel_iter_t iter; /* Selection iteration info */
|
|
|
|
|
hbool_t iter_init=0; /* Selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
uint8_t *loc; /* Current element location in buffer */
|
|
|
|
|
hssize_t coords[H5O_LAYOUT_NDIMS]; /* Coordinates of element in dataspace */
|
|
|
|
|
hssize_t nelmts; /* Number of elements in selection */
|
|
|
|
|
hsize_t space_size[H5O_LAYOUT_NDIMS]; /* Dataspace size */
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
|
|
|
|
hsize_t curr_off; /* Current offset within sequence */
|
|
|
|
|
hsize_t tmp_off; /* Temporary offset within sequence */
|
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
|
|
|
|
size_t curr_len; /* Length of bytes left to process in sequence */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t curr_seq; /* Current sequnce being worked on */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
size_t max_bytes; /* Maximum number of bytes allowed in sequences */
|
|
|
|
|
size_t elmt_size; /* Datatype size */
|
|
|
|
|
int ndims; /* Number of dimensions in dataspace */
|
|
|
|
|
int i; /* Local Index variable */
|
|
|
|
|
herr_t user_ret=0; /* User's return value */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* Return value */
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_iterate, FAIL);
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert(buf);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
assert(H5I_DATATYPE == H5I_get_type(type_id));
|
1999-06-23 21:16:13 -05:00
|
|
|
|
assert(space);
|
1999-08-10 13:54:06 -05:00
|
|
|
|
assert(op);
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Get the hyperslab vector size */
|
|
|
|
|
/* (from the default data transfer property list, for now) */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(H5P_DATASET_XFER_DEFAULT))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Get the datatype size */
|
2002-07-31 14:17:12 -05:00
|
|
|
|
if (NULL==(dt=H5I_object_verify(type_id,H5I_DATATYPE)))
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not an valid base datatype");
|
|
|
|
|
if((elmt_size=H5T_get_size(dt))==0)
|
|
|
|
|
HGOTO_ERROR(H5E_DATATYPE, H5E_BADSIZE, FAIL, "datatype size invalid");
|
|
|
|
|
|
|
|
|
|
/* Initialize iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*space->select.iter_init)(space, elmt_size, &iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
iter_init=1; /* Selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Get the number of elements in selection */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
if((nelmts = (*space->select.get_npoints)(space))<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTCOUNT, FAIL, "can't get number of elements selected");
|
|
|
|
|
|
|
|
|
|
/* Get the rank of the dataspace */
|
|
|
|
|
ndims=space->extent.u.simple.rank;
|
|
|
|
|
|
2002-10-15 16:12:48 -05:00
|
|
|
|
if (ndims > 0){
|
|
|
|
|
/* Copy the size of the space */
|
|
|
|
|
assert(space->extent.u.simple.size);
|
|
|
|
|
HDmemcpy(space_size, space->extent.u.simple.size, ndims*sizeof(hsize_t));
|
|
|
|
|
}
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
space_size[ndims]=elmt_size;
|
|
|
|
|
|
|
|
|
|
/* Compute the maximum number of bytes required */
|
|
|
|
|
H5_ASSIGN_OVERFLOW(max_bytes,nelmts*elmt_size,hsize_t,size_t);
|
|
|
|
|
|
|
|
|
|
/* Loop, while elements left in selection */
|
|
|
|
|
while(max_bytes>0 && user_ret==0) {
|
|
|
|
|
/* Get the sequences of bytes */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,0,&iter,elmt_size,(size_t)vector_size,max_bytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Loop, while sequences left to process */
|
|
|
|
|
for(curr_seq=0; curr_seq<nseq && user_ret==0; curr_seq++) {
|
|
|
|
|
/* Get the current offset */
|
|
|
|
|
curr_off=off[curr_seq];
|
|
|
|
|
|
|
|
|
|
/* Get the number of bytes in sequence */
|
|
|
|
|
curr_len=len[curr_seq];
|
|
|
|
|
|
|
|
|
|
/* Loop, while bytes left in sequence */
|
|
|
|
|
while(curr_len>0 && user_ret==0) {
|
|
|
|
|
/* Compute the coordinate from the offset */
|
|
|
|
|
for(i=ndims, tmp_off=curr_off; i>=0; i--) {
|
|
|
|
|
coords[i]=tmp_off%space_size[i];
|
|
|
|
|
tmp_off/=space_size[i];
|
|
|
|
|
} /* end for */
|
|
|
|
|
|
|
|
|
|
/* Get the location within the user's buffer */
|
|
|
|
|
loc=(unsigned char *)buf+curr_off;
|
|
|
|
|
|
|
|
|
|
/* Call user's callback routine */
|
|
|
|
|
user_ret=(*op)(loc,type_id,(hsize_t)ndims,coords,operator_data);
|
|
|
|
|
|
|
|
|
|
/* Increment offset in dataspace */
|
|
|
|
|
curr_off+=elmt_size;
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left in sequence */
|
|
|
|
|
curr_len-=elmt_size;
|
|
|
|
|
} /* end while */
|
|
|
|
|
} /* end for */
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Decrement number of elements left to process */
|
|
|
|
|
assert((nbytes%elmt_size)==0);
|
|
|
|
|
max_bytes-=nbytes;
|
|
|
|
|
} /* end while */
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Set return value */
|
|
|
|
|
ret_value=user_ret;
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
done:
|
|
|
|
|
/* Release selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(iter_init) {
|
|
|
|
|
if ((*space->select.iter_release)(&iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Release length & offset vectors */
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
1999-06-23 21:16:13 -05:00
|
|
|
|
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
1999-06-23 21:16:13 -05:00
|
|
|
|
} /* end H5S_select_iterate() */
|
|
|
|
|
|
2002-02-07 11:21:24 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5Sget_select_type
|
|
|
|
|
PURPOSE
|
|
|
|
|
Retrieve the type of selection in a dataspace
|
|
|
|
|
USAGE
|
|
|
|
|
H5S_sel_type H5Sget_select_type(space_id)
|
|
|
|
|
hid_t space_id; IN: Dataspace object to reset
|
|
|
|
|
RETURNS
|
|
|
|
|
Non-negative on success/Negative on failure. Return value is from the
|
|
|
|
|
set of values in the H5S_sel_type enumerated type.
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
This function retrieves the type of selection currently defined for
|
|
|
|
|
a dataspace.
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
H5S_sel_type
|
|
|
|
|
H5Sget_select_type(hid_t space_id)
|
|
|
|
|
{
|
|
|
|
|
H5S_t *space = NULL; /* dataspace to modify */
|
2002-08-08 11:52:55 -05:00
|
|
|
|
H5S_sel_type ret_value; /* Return value */
|
2002-02-07 11:21:24 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_API(H5Sget_select_type, H5S_SEL_ERROR);
|
2002-02-07 11:21:24 -05:00
|
|
|
|
H5TRACE1("St","i",space_id);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
2002-07-31 14:17:12 -05:00
|
|
|
|
if (NULL == (space = H5I_object_verify(space_id, H5I_DATASPACE)))
|
2002-08-08 11:52:55 -05:00
|
|
|
|
HGOTO_ERROR(H5E_ATOM, H5E_BADATOM, H5S_SEL_ERROR, "not a data space");
|
2002-02-07 11:21:24 -05:00
|
|
|
|
|
2002-08-08 11:52:55 -05:00
|
|
|
|
/* Set return value */
|
|
|
|
|
ret_value=space->select.type;
|
|
|
|
|
|
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_API(ret_value);
|
2002-02-07 11:21:24 -05:00
|
|
|
|
} /* end H5Sget_select_type() */
|
|
|
|
|
|
2002-04-03 12:07:14 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5S_select_shape_same
|
|
|
|
|
PURPOSE
|
|
|
|
|
Check if two selections are the same shape
|
|
|
|
|
USAGE
|
|
|
|
|
htri_t H5S_select_shape_same(space1, space2)
|
|
|
|
|
const H5S_t *space1; IN: 1st Dataspace pointer to compare
|
|
|
|
|
const H5S_t *space2; IN: 2nd Dataspace pointer to compare
|
|
|
|
|
RETURNS
|
|
|
|
|
TRUE/FALSE/FAIL
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Checks to see if the current selection in the dataspaces are the same
|
|
|
|
|
dimensionality and shape.
|
|
|
|
|
This is primarily used for reading the entire selection in one swoop.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
Assumes that there is only a single "block" for hyperslab selections.
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
htri_t
|
|
|
|
|
H5S_select_shape_same(const H5S_t *space1, const H5S_t *space2)
|
|
|
|
|
{
|
|
|
|
|
H5S_hyper_span_t *span1=NULL,*span2=NULL; /* Hyperslab span node */
|
2002-08-08 12:52:17 -05:00
|
|
|
|
hsize_t elmts1=0,elmts2=0; /* Number of elements in each dimension of selection */
|
2002-04-03 12:07:14 -05:00
|
|
|
|
unsigned u; /* Index variable */
|
|
|
|
|
htri_t ret_value=TRUE; /* return value */
|
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_shape_same, FAIL);
|
2002-04-03 12:07:14 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert(space1);
|
|
|
|
|
assert(space2);
|
|
|
|
|
|
|
|
|
|
if (space1->extent.u.simple.rank!=space2->extent.u.simple.rank)
|
|
|
|
|
HGOTO_DONE(FALSE);
|
|
|
|
|
|
|
|
|
|
/* Get information about memory and file */
|
|
|
|
|
for (u=0; u<space1->extent.u.simple.rank; u++) {
|
|
|
|
|
switch(space1->select.type) {
|
|
|
|
|
case H5S_SEL_HYPERSLABS:
|
|
|
|
|
/* Check size hyperslab selection in this dimension */
|
|
|
|
|
if(space1->select.sel_info.hslab.diminfo != NULL) {
|
|
|
|
|
elmts1=space1->select.sel_info.hslab.diminfo[u].block;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Check for the first dimension */
|
|
|
|
|
if(span1==NULL)
|
|
|
|
|
span1=space1->select.sel_info.hslab.span_lst->head;
|
|
|
|
|
|
|
|
|
|
/* Get the number of elements in the span */
|
|
|
|
|
elmts1=(span1->high-span1->low)+1;
|
|
|
|
|
|
|
|
|
|
/* Advance to the next dimension */
|
|
|
|
|
span1=span1->down->head;
|
|
|
|
|
} /* end else */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_ALL:
|
|
|
|
|
elmts1=space1->extent.u.simple.size[u];
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_POINTS:
|
|
|
|
|
elmts1=1;
|
|
|
|
|
break;
|
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
case H5S_SEL_NONE:
|
|
|
|
|
elmts1=0;
|
|
|
|
|
break;
|
|
|
|
|
|
2002-04-03 12:07:14 -05:00
|
|
|
|
default:
|
|
|
|
|
assert(0 && "Invalid selection type!");
|
|
|
|
|
} /* end switch */
|
|
|
|
|
|
|
|
|
|
switch(space2->select.type) {
|
|
|
|
|
case H5S_SEL_HYPERSLABS:
|
|
|
|
|
/* Check size hyperslab selection in this dimension */
|
|
|
|
|
if(space2->select.sel_info.hslab.diminfo != NULL) {
|
|
|
|
|
elmts2=space2->select.sel_info.hslab.diminfo[u].block;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Check for the first dimension */
|
|
|
|
|
if(span2==NULL)
|
|
|
|
|
span2=space2->select.sel_info.hslab.span_lst->head;
|
|
|
|
|
|
|
|
|
|
/* Get the number of elements in the span */
|
|
|
|
|
elmts2=(span2->high-span2->low)+1;
|
|
|
|
|
|
|
|
|
|
/* Advance to the next dimension */
|
|
|
|
|
span2=span2->down->head;
|
|
|
|
|
} /* end else */
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_ALL:
|
|
|
|
|
elmts2=space2->extent.u.simple.size[u];
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
|
case H5S_SEL_POINTS:
|
|
|
|
|
elmts2=1;
|
|
|
|
|
break;
|
|
|
|
|
|
2002-07-31 10:27:07 -05:00
|
|
|
|
case H5S_SEL_NONE:
|
|
|
|
|
elmts2=0;
|
|
|
|
|
break;
|
|
|
|
|
|
2002-04-03 12:07:14 -05:00
|
|
|
|
default:
|
|
|
|
|
assert(0 && "Invalid selection type!");
|
|
|
|
|
} /* end switch */
|
|
|
|
|
|
|
|
|
|
/* Make certaint the selections have the same number of elements in this dimension */
|
|
|
|
|
if (elmts1!=elmts2)
|
|
|
|
|
HGOTO_DONE(FALSE);
|
|
|
|
|
} /* end for */
|
|
|
|
|
|
|
|
|
|
done:
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
2002-04-03 12:07:14 -05:00
|
|
|
|
} /* H5S_select_shape_same() */
|
|
|
|
|
|
2002-04-09 07:47:34 -05:00
|
|
|
|
|
|
|
|
|
/*--------------------------------------------------------------------------
|
|
|
|
|
NAME
|
|
|
|
|
H5S_select_fill
|
|
|
|
|
PURPOSE
|
|
|
|
|
Fill a selection in memory with a value
|
|
|
|
|
USAGE
|
|
|
|
|
herr_t H5S_select_fill(fill,fill_size,space,buf)
|
|
|
|
|
const void *fill; IN: Pointer to fill value to use
|
|
|
|
|
size_t fill_size; IN: Size of elements in memory buffer & size of
|
|
|
|
|
fill value
|
|
|
|
|
H5S_t *space; IN: Dataspace describing memory buffer &
|
|
|
|
|
containing selection to use.
|
|
|
|
|
void *buf; IN/OUT: Memory buffer to fill selection in
|
|
|
|
|
RETURNS
|
|
|
|
|
Non-negative on success/Negative on failure.
|
|
|
|
|
DESCRIPTION
|
|
|
|
|
Use the selection in the dataspace to fill elements in a memory buffer.
|
|
|
|
|
GLOBAL VARIABLES
|
|
|
|
|
COMMENTS, BUGS, ASSUMPTIONS
|
|
|
|
|
The memory buffer elements are assumed to have the same datatype as the
|
|
|
|
|
fill value being placed into them.
|
|
|
|
|
EXAMPLES
|
|
|
|
|
REVISION LOG
|
|
|
|
|
--------------------------------------------------------------------------*/
|
|
|
|
|
herr_t
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
H5S_select_fill(void *_fill, size_t fill_size, const H5S_t *space, void *_buf)
|
2002-04-09 07:47:34 -05:00
|
|
|
|
{
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5S_sel_iter_t iter; /* Selection iteration info */
|
|
|
|
|
hbool_t iter_init=0; /* Selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
uint8_t *buf; /* Current location in buffer */
|
2002-04-25 12:56:56 -05:00
|
|
|
|
void *fill=_fill; /* Alias for fill-value buffer */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
hssize_t nelmts; /* Number of elements in selection */
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t curr_seq; /* Current sequnce being worked on */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
size_t max_bytes; /* Total number of bytes in selection */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* return value */
|
2002-04-09 07:47:34 -05:00
|
|
|
|
|
2002-05-29 10:07:55 -05:00
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_fill, FAIL);
|
2002-04-09 07:47:34 -05:00
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert(fill_size>0);
|
|
|
|
|
assert(space);
|
2002-07-24 14:09:55 -05:00
|
|
|
|
assert(_buf);
|
2002-04-09 07:47:34 -05:00
|
|
|
|
|
2002-04-25 08:15:22 -05:00
|
|
|
|
/* Check if we need a temporary fill value buffer */
|
|
|
|
|
if(fill==NULL) {
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (NULL==(fill = H5FL_BLK_CALLOC(type_elem,fill_size)))
|
2002-04-25 08:15:22 -05:00
|
|
|
|
HGOTO_ERROR (H5E_RESOURCE, H5E_NOSPACE, FAIL, "fill value buffer allocation failed");
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Get the hyperslab vector size */
|
|
|
|
|
/* (from the default data transfer property list, for now) */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(H5P_DATASET_XFER_DEFAULT))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Initialize iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*space->select.iter_init)(space, fill_size, &iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
iter_init=1; /* Selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Get the number of elements in selection */
|
2002-07-31 10:27:07 -05:00
|
|
|
|
if((nelmts = (*space->select.get_npoints)(space))<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTCOUNT, FAIL, "can't get number of elements selected");
|
|
|
|
|
|
|
|
|
|
/* Compute the number of bytes to process */
|
|
|
|
|
H5_CHECK_OVERFLOW(nelmts,hssize_t,size_t);
|
|
|
|
|
max_bytes=(size_t)nelmts*fill_size;
|
|
|
|
|
|
|
|
|
|
/* Loop, while elements left in selection */
|
|
|
|
|
while(max_bytes>0) {
|
|
|
|
|
/* Get the sequences of bytes */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,0,&iter,fill_size,(size_t)vector_size,max_bytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Loop over sequences */
|
|
|
|
|
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
|
|
|
|
|
/* Get offset in memory buffer */
|
|
|
|
|
buf=(uint8_t *)_buf+off[curr_seq];
|
|
|
|
|
|
|
|
|
|
/* Fill each sequence in memory with fill value */
|
|
|
|
|
assert((len[curr_seq]%fill_size)==0);
|
|
|
|
|
H5V_array_fill(buf, fill, fill_size, (len[curr_seq]/fill_size));
|
|
|
|
|
} /* end for */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left to process */
|
|
|
|
|
max_bytes-=nbytes;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
/* Release selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(iter_init) {
|
|
|
|
|
if ((*space->select.iter_release)(&iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Release length & offset vectors */
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
|
|
|
|
|
|
|
|
|
/* Release fill value, if allocated */
|
|
|
|
|
if(_fill==NULL && fill)
|
|
|
|
|
H5FL_BLK_FREE(type_elem,fill);
|
|
|
|
|
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* H5S_select_fill() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_fscat
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Scatters dataset elements from the type conversion buffer BUF
|
|
|
|
|
* to the file F where the data points are arranged according to
|
|
|
|
|
* the file data space FILE_SPACE and stored according to
|
|
|
|
|
* LAYOUT and EFL. Each element is ELMT_SIZE bytes.
|
|
|
|
|
* The caller is requesting that NELMTS elements are copied.
|
|
|
|
|
*
|
|
|
|
|
* Return: Non-negative on success/Negative on failure
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Thursday, June 20, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
herr_t
|
[svn-r5894] Purpose:
Bug fix/Code cleanup/New Feature
Description:
Correct problems with writing fill-values to external storage and allocate
the data storage at the correct times.
Also, mostly straighten out the strange code which allocates and fills
raw data storage for datasets. Things are still a bit odd in that the
fill-values for chunked datasets are written when the space is allocated,
instead of in a separate routine, but there are two reasons for this:
it's inefficient (especially in parallel) to iterate through all the chunks
twice, and (more importantly) the space needed to store compressed chunks
isn't known until we've got a buffer of compressed fill-values ready to
write to the chunk.
Additionally, add in the H5D_SPACE_ALLOC_INCR and H5D_SPACE_ALLOC_DEFAULT
setting for the "space time", which incorporate the previous behavior of
the space allocation for chunked datasets.
The default settings for the different types of dataset storage are now
as follows:
Contiguous - Late
Chunked - Incremental
Compact - Early
This checkin also incorporates a change to the behavior of external data
storage in two ways - fill-values are _never_ written to external storage
(under the assumption that writing fill-values is triggered by allocating
space in an HDF5 file, and since space is not allocated in the file, the
fill-values should not be written) and external data files are now created
if they don't exist when data is written to them. The fill-value will
probably need to be revisited at some time in the future, this just seemed
like the safer course currently.
I think I cleaned up some compiler errors also, before getting bogged down
in the fixes for the space allocation and fill-values.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/serial & parallel. Will be testing on IRIX64
6.5 (modi4) in serial & parallel shortly.
2002-08-27 08:41:32 -05:00
|
|
|
|
H5S_select_fscat (H5F_t *f, struct H5O_layout_t *layout,
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5P_genplist_t *dc_plist, const H5O_efl_t *efl, size_t elmt_size,
|
|
|
|
|
const H5S_t *space, H5S_sel_iter_t *iter, hsize_t nelmts,
|
|
|
|
|
hid_t dxpl_id, const void *_buf)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
{
|
|
|
|
|
const uint8_t *buf=_buf; /* Alias for pointer arithmetic */
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t maxbytes; /* Number of bytes in the buffer */
|
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* Return value */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_fscat, FAIL);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert (f);
|
|
|
|
|
assert (layout);
|
|
|
|
|
assert (elmt_size>0);
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
assert (efl);
|
2002-07-31 10:27:07 -05:00
|
|
|
|
assert (space);
|
|
|
|
|
assert (iter);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
assert (nelmts>0);
|
|
|
|
|
assert (_buf);
|
2002-11-01 13:39:20 -05:00
|
|
|
|
assert(TRUE==H5P_isa_class(dxpl_id,H5P_DATASET_XFER));
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Compute the number of bytes available in buffer */
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,nelmts*elmt_size,hsize_t,size_t);
|
|
|
|
|
|
|
|
|
|
/* Loop until all elements are written */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Get list of sequences for selection to write */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,H5S_GET_SEQ_LIST_SORTED,iter,elmt_size,(size_t)vector_size,maxbytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Write sequence list out */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_writev(f, dxpl_id, layout, dc_plist, efl, space, elmt_size, nseq, len, off, buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_WRITEERROR, FAIL, "write error");
|
|
|
|
|
|
|
|
|
|
/* Update buffer */
|
|
|
|
|
buf += nbytes;
|
|
|
|
|
|
|
|
|
|
/* Decrement number of elements left to process */
|
|
|
|
|
assert(nbytes%elmt_size==0);
|
|
|
|
|
maxbytes -= nbytes;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* H5S_select_fscat() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_fgath
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Gathers data points from file F and accumulates them in the
|
|
|
|
|
* type conversion buffer BUF. The LAYOUT argument describes
|
|
|
|
|
* how the data is stored on disk and EFL describes how the data
|
|
|
|
|
* is organized in external files. ELMT_SIZE is the size in
|
|
|
|
|
* bytes of a datum which this function treats as opaque.
|
|
|
|
|
* FILE_SPACE describes the data space of the dataset on disk
|
|
|
|
|
* and the elements that have been selected for reading (via
|
|
|
|
|
* hyperslab, etc). This function will copy at most NELMTS
|
|
|
|
|
* elements.
|
|
|
|
|
*
|
|
|
|
|
* Return: Success: Number of elements copied.
|
|
|
|
|
* Failure: 0
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Monday, June 24, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
hsize_t
|
|
|
|
|
H5S_select_fgath (H5F_t *f, const struct H5O_layout_t *layout,
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5P_genplist_t *dc_plist, const H5O_efl_t *efl, size_t elmt_size,
|
|
|
|
|
const H5S_t *space, H5S_sel_iter_t *iter, hsize_t nelmts,
|
|
|
|
|
hid_t dxpl_id, void *_buf/*out*/)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
{
|
|
|
|
|
uint8_t *buf=_buf; /* Alias for pointer arithmetic */
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t maxbytes; /* Number of bytes in the buffer */
|
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
hsize_t ret_value=nelmts; /* Return value */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_fgath, 0);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert (f);
|
|
|
|
|
assert (layout);
|
|
|
|
|
assert (elmt_size>0);
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
assert (efl);
|
2002-07-31 10:27:07 -05:00
|
|
|
|
assert (space);
|
|
|
|
|
assert (iter);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
assert (nelmts>0);
|
|
|
|
|
assert (_buf);
|
|
|
|
|
|
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, 0, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Compute the number of bytes available in buffer */
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,nelmts*elmt_size,hsize_t,size_t);
|
|
|
|
|
|
|
|
|
|
/* Loop until all elements are written */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Get list of sequences for selection to write */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,H5S_GET_SEQ_LIST_SORTED,iter,elmt_size,(size_t)vector_size,maxbytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, 0, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Read sequence list in */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_readv(f, dxpl_id, layout, dc_plist, efl, space, elmt_size, nseq, len, off, buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_READERROR, 0, "read error");
|
|
|
|
|
|
|
|
|
|
/* Update buffer */
|
|
|
|
|
buf += nbytes;
|
|
|
|
|
|
|
|
|
|
/* Decrement number of elements left to process */
|
|
|
|
|
assert(nbytes%elmt_size==0);
|
|
|
|
|
maxbytes -= nbytes;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* H5S_select_fgath() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_mscat
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Scatters NELMTS data points from the scatter buffer
|
|
|
|
|
* TSCAT_BUF to the application buffer BUF. Each element is
|
|
|
|
|
* ELMT_SIZE bytes and they are organized in application memory
|
|
|
|
|
* according to SPACE.
|
|
|
|
|
*
|
|
|
|
|
* Return: Non-negative on success/Negative on failure
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Monday, July 8, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
herr_t
|
|
|
|
|
H5S_select_mscat (const void *_tscat_buf, size_t elmt_size, const H5S_t *space,
|
|
|
|
|
H5S_sel_iter_t *iter, hsize_t nelmts, hid_t dxpl_id, void *_buf/*out*/)
|
|
|
|
|
{
|
|
|
|
|
uint8_t *buf=(uint8_t *)_buf; /* Get local copies for address arithmetic */
|
|
|
|
|
const uint8_t *tscat_buf=(const uint8_t *)_tscat_buf;
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
|
|
|
|
size_t curr_len; /* Length of bytes left to process in sequence */
|
|
|
|
|
size_t maxbytes; /* Number of bytes in the buffer */
|
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t curr_seq; /* Current sequence being processed */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* Number of elements scattered */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_mscat, FAIL);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert (tscat_buf);
|
|
|
|
|
assert (elmt_size>0);
|
|
|
|
|
assert (space);
|
|
|
|
|
assert (iter);
|
|
|
|
|
assert (nelmts>0);
|
|
|
|
|
assert (buf);
|
|
|
|
|
|
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Compute the number of bytes available in buffer */
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,nelmts*elmt_size,hsize_t,size_t);
|
|
|
|
|
|
|
|
|
|
/* Loop until all elements are written */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Get list of sequences for selection to write */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,0,iter,elmt_size,(size_t)vector_size,maxbytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, 0, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Loop, while sequences left to process */
|
|
|
|
|
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
|
|
|
|
|
/* Get the number of bytes in sequence */
|
|
|
|
|
curr_len=len[curr_seq];
|
|
|
|
|
|
|
|
|
|
HDmemcpy(buf+off[curr_seq],tscat_buf,curr_len);
|
|
|
|
|
|
|
|
|
|
/* Advance offset in destination buffer */
|
|
|
|
|
tscat_buf+=curr_len;
|
|
|
|
|
} /* end for */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of elements left to process */
|
|
|
|
|
assert(nbytes%elmt_size==0);
|
|
|
|
|
maxbytes -= nbytes;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* H5S_select_mscat() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_mgath
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Gathers dataset elements from application memory BUF and
|
|
|
|
|
* copies them into the gather buffer TGATH_BUF.
|
|
|
|
|
* Each element is ELMT_SIZE bytes and arranged in application
|
|
|
|
|
* memory according to SPACE.
|
|
|
|
|
* The caller is requesting that at most NELMTS be gathered.
|
|
|
|
|
*
|
|
|
|
|
* Return: Success: Number of elements copied.
|
|
|
|
|
* Failure: 0
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Monday, June 24, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
hsize_t
|
|
|
|
|
H5S_select_mgath (const void *_buf, size_t elmt_size, const H5S_t *space,
|
|
|
|
|
H5S_sel_iter_t *iter, hsize_t nelmts, hid_t dxpl_id, void *_tgath_buf/*out*/)
|
|
|
|
|
{
|
|
|
|
|
const uint8_t *buf=(const uint8_t *)_buf; /* Get local copies for address arithmetic */
|
|
|
|
|
uint8_t *tgath_buf=(uint8_t *)_tgath_buf;
|
|
|
|
|
hsize_t *off=NULL; /* Array to store sequence offsets */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t *len=NULL; /* Array to store sequence lengths */
|
|
|
|
|
size_t curr_len; /* Length of bytes left to process in sequence */
|
|
|
|
|
size_t maxbytes; /* Number of bytes in the buffer */
|
|
|
|
|
size_t nseq; /* Number of sequences generated */
|
|
|
|
|
size_t curr_seq; /* Current sequence being processed */
|
|
|
|
|
size_t nbytes; /* Number of bytes used in sequences */
|
|
|
|
|
hsize_t ret_value=nelmts; /* Number of elements gathered */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_mgath, 0);
|
|
|
|
|
|
|
|
|
|
/* Check args */
|
|
|
|
|
assert (buf);
|
|
|
|
|
assert (elmt_size>0);
|
|
|
|
|
assert (space);
|
|
|
|
|
assert (iter);
|
|
|
|
|
assert (nelmts>0);
|
|
|
|
|
assert (tgath_buf);
|
|
|
|
|
|
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, 0, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Compute the number of bytes available in buffer */
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,nelmts*elmt_size,hsize_t,size_t);
|
|
|
|
|
|
|
|
|
|
/* Loop until all elements are written */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Get list of sequences for selection to write */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*space->select.get_seq_list)(space,0,iter,elmt_size,(size_t)vector_size,maxbytes,&nseq,&nbytes,off,len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, 0, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Loop, while sequences left to process */
|
|
|
|
|
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
|
|
|
|
|
/* Get the number of bytes in sequence */
|
|
|
|
|
curr_len=len[curr_seq];
|
|
|
|
|
|
|
|
|
|
HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
|
|
|
|
|
|
|
|
|
|
/* Advance offset in gather buffer */
|
|
|
|
|
tgath_buf+=curr_len;
|
|
|
|
|
} /* end for */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of elements left to process */
|
|
|
|
|
assert(nbytes%elmt_size==0);
|
|
|
|
|
maxbytes -= nbytes;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
if(len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,len);
|
|
|
|
|
if(off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* H5S_select_mgath() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_read
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Reads directly from file into application memory.
|
|
|
|
|
*
|
|
|
|
|
* Return: Non-negative on success/Negative on failure
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Tuesday, July 23, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
herr_t
|
|
|
|
|
H5S_select_read(H5F_t *f, const H5O_layout_t *layout, H5P_genplist_t *dc_plist,
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
const H5O_efl_t *efl, size_t elmt_size, const H5S_t *file_space,
|
|
|
|
|
const H5S_t *mem_space, hid_t dxpl_id, void *_buf/*out*/)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
{
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5S_sel_iter_t mem_iter; /* Memory selection iteration info */
|
|
|
|
|
hbool_t mem_iter_init=0; /* Memory selection iteration info has been initialized */
|
|
|
|
|
H5S_sel_iter_t file_iter; /* File selection iteration info */
|
|
|
|
|
hbool_t file_iter_init=0; /* File selection iteration info has been initialized */
|
2002-08-08 12:52:17 -05:00
|
|
|
|
uint8_t *buf=NULL; /* Local buffer pointer, for address arithmetic */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
hsize_t *mem_off=NULL; /* Array to store sequence offsets in memory */
|
|
|
|
|
hsize_t *file_off=NULL; /* Array to store sequence offsets in the file */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t *mem_len=NULL; /* Array to store sequence lengths in memory */
|
|
|
|
|
size_t *file_len=NULL; /* Array to store sequence lengths in the file */
|
|
|
|
|
size_t maxbytes; /* Number of bytes in selection */
|
|
|
|
|
size_t mem_nseq; /* Number of sequences generated in the file */
|
|
|
|
|
size_t file_nseq; /* Number of sequences generated in memory */
|
|
|
|
|
size_t mem_nbytes; /* Number of bytes used in memory sequences */
|
|
|
|
|
size_t file_nbytes; /* Number of bytes used in file sequences */
|
|
|
|
|
size_t curr_mem_seq; /* Current memory sequence to operate on */
|
|
|
|
|
size_t curr_file_seq; /* Current file sequence to operate on */
|
|
|
|
|
size_t tmp_file_len; /* Temporary number of bytes in file sequence */
|
|
|
|
|
unsigned partial_file; /* Whether a partial file sequence was accessed */
|
2002-08-08 12:52:17 -05:00
|
|
|
|
size_t orig_file_len=0; /* Original file sequence length for partial file access */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t orig_file_seq; /* Original file sequence to operate on */
|
|
|
|
|
size_t tot_file_seq; /* Number of file sequences to access */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* Return value */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_read, FAIL);
|
|
|
|
|
|
2002-11-01 13:39:20 -05:00
|
|
|
|
/* Check args */
|
|
|
|
|
assert(f);
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
assert(efl);
|
2002-11-01 13:39:20 -05:00
|
|
|
|
assert(_buf);
|
|
|
|
|
assert(TRUE==H5P_isa_class(dxpl_id,H5P_DATASET_XFER));
|
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((mem_len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((mem_off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((file_len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((file_off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Initialize file iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*file_space->select.iter_init)(file_space, elmt_size, &file_iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
file_iter_init=1; /* File selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Initialize memory iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*mem_space->select.iter_init)(mem_space, elmt_size, &mem_iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
mem_iter_init=1; /* Memory selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Get number of bytes in selection */
|
2003-01-17 15:34:14 -05:00
|
|
|
|
#ifndef NDEBUG
|
|
|
|
|
{
|
|
|
|
|
hsize_t tmp_maxbytes=(*file_space->select.get_npoints)(file_space)*elmt_size;
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,tmp_maxbytes,hsize_t,size_t);
|
|
|
|
|
}
|
|
|
|
|
#else /* NDEBUG */
|
|
|
|
|
maxbytes=(size_t)((*file_space->select.get_npoints)(file_space)*elmt_size);
|
|
|
|
|
#endif /* NDEBUG */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Initialize sequence counts */
|
|
|
|
|
curr_mem_seq=curr_file_seq=0;
|
|
|
|
|
mem_nseq=file_nseq=0;
|
|
|
|
|
|
|
|
|
|
/* Loop, until all bytes are processed */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Check if more file sequences are needed */
|
|
|
|
|
if(curr_file_seq>=file_nseq) {
|
|
|
|
|
/* Get sequences for file selection */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*file_space->select.get_seq_list)(file_space,H5S_GET_SEQ_LIST_SORTED,&file_iter,elmt_size,(size_t)vector_size,maxbytes,&file_nseq,&file_nbytes,file_off,file_len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Start at the beginning of the sequences again */
|
|
|
|
|
curr_file_seq=0;
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if more memory sequences are needed */
|
|
|
|
|
if(curr_mem_seq>=mem_nseq) {
|
|
|
|
|
/* Get sequences for memory selection */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*mem_space->select.get_seq_list)(mem_space,0,&mem_iter,elmt_size,(size_t)vector_size,maxbytes,&mem_nseq,&mem_nbytes,mem_off,mem_len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Start at the beginning of the sequences again */
|
|
|
|
|
curr_mem_seq=0;
|
|
|
|
|
|
|
|
|
|
/* Set the buffer pointer using the first sequence */
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[0],hsize_t,size_t);
|
|
|
|
|
buf=(uint8_t *)_buf+(size_t)mem_off[0];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if current file sequence will fit into current memory sequence */
|
|
|
|
|
if(mem_len[curr_mem_seq]>=file_len[curr_file_seq]) {
|
|
|
|
|
/* Save the current number file sequence */
|
|
|
|
|
orig_file_seq=curr_file_seq;
|
|
|
|
|
|
|
|
|
|
/* Determine how many file sequences will fit into current memory sequence */
|
|
|
|
|
tmp_file_len=0;
|
|
|
|
|
tot_file_seq=0;
|
2002-09-13 11:27:09 -05:00
|
|
|
|
while( curr_file_seq<file_nseq && (tmp_file_len+file_len[curr_file_seq])<=mem_len[curr_mem_seq] ) {
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
tmp_file_len+=file_len[curr_file_seq];
|
|
|
|
|
curr_file_seq++;
|
|
|
|
|
tot_file_seq++;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
/* Check for partial file sequence */
|
|
|
|
|
if(tmp_file_len<mem_len[curr_mem_seq] && curr_file_seq<file_nseq) {
|
|
|
|
|
/* Get the original file sequence length */
|
|
|
|
|
orig_file_len=file_len[curr_file_seq];
|
|
|
|
|
|
|
|
|
|
/* Make the last file sequence a partial access */
|
|
|
|
|
file_len[curr_file_seq]=mem_len[curr_mem_seq]-tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Increase the number of bytes to access */
|
|
|
|
|
tmp_file_len=mem_len[curr_mem_seq];
|
|
|
|
|
|
|
|
|
|
/* Indicate that there is an extra sequence to include in the file access */
|
|
|
|
|
tot_file_seq++;
|
|
|
|
|
|
|
|
|
|
/* Indicate a partial file sequence */
|
|
|
|
|
partial_file=1;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else
|
|
|
|
|
partial_file=0;
|
|
|
|
|
|
|
|
|
|
/* Read file sequences into current memory sequence */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_readv(f, dxpl_id, layout, dc_plist, efl, file_space, elmt_size, tot_file_seq, &file_len[orig_file_seq], &file_off[orig_file_seq], buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_READERROR, FAIL, "read error");
|
|
|
|
|
|
|
|
|
|
/* Update last file sequence, if it was partially accessed */
|
|
|
|
|
if(partial_file) {
|
|
|
|
|
file_off[curr_file_seq]+=orig_file_len-file_len[curr_file_seq];
|
|
|
|
|
file_len[curr_file_seq]=orig_file_len-file_len[curr_file_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if the current memory sequence was only partially accessed */
|
|
|
|
|
if(tmp_file_len<mem_len[curr_mem_seq]) {
|
|
|
|
|
/* Adjust current memory sequence */
|
|
|
|
|
mem_off[curr_mem_seq]+=tmp_file_len;
|
|
|
|
|
mem_len[curr_mem_seq]-=tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Adjust memory buffer pointer */
|
|
|
|
|
buf+=tmp_file_len;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Must have used entire memory sequence, advance to next one */
|
|
|
|
|
curr_mem_seq++;
|
|
|
|
|
|
|
|
|
|
/* Check if it is valid to adjust buffer pointer */
|
|
|
|
|
if(curr_mem_seq<mem_nseq) {
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[curr_mem_seq],hsize_t,size_t);
|
|
|
|
|
buf=(uint8_t *)_buf+(size_t)mem_off[curr_mem_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
} /* end else */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left to process */
|
|
|
|
|
maxbytes-=tmp_file_len;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Save number of bytes to access */
|
|
|
|
|
tmp_file_len=mem_len[curr_mem_seq];
|
|
|
|
|
|
|
|
|
|
/* Read part of current file sequence into current memory sequence */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_read(f, dxpl_id, layout, dc_plist, efl, file_space, elmt_size, tmp_file_len, file_off[curr_file_seq], buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_READERROR, FAIL, "read error");
|
|
|
|
|
|
|
|
|
|
/* Update current file sequence information */
|
|
|
|
|
file_off[curr_file_seq]+=tmp_file_len;
|
|
|
|
|
file_len[curr_file_seq]-=tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Increment memory sequence */
|
|
|
|
|
curr_mem_seq++;
|
|
|
|
|
|
|
|
|
|
/* Check if it is valid to adjust buffer pointer */
|
|
|
|
|
if(curr_mem_seq<mem_nseq) {
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[curr_mem_seq],hsize_t,size_t);
|
|
|
|
|
buf=(uint8_t *)_buf+(size_t)mem_off[curr_mem_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left to process */
|
|
|
|
|
maxbytes-=tmp_file_len;
|
|
|
|
|
} /* end else */
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
/* Release file selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(file_iter_init) {
|
|
|
|
|
if ((*file_space->select.iter_release)(&file_iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Release memory selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(mem_iter_init) {
|
|
|
|
|
if ((*mem_space->select.iter_release)(&mem_iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Free vector arrays */
|
|
|
|
|
if(file_len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,file_len);
|
|
|
|
|
if(file_off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,file_off);
|
|
|
|
|
if(mem_len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,mem_len);
|
|
|
|
|
if(mem_off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,mem_off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end H5S_select_read() */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
* Function: H5S_select_write
|
|
|
|
|
*
|
|
|
|
|
* Purpose: Writes directly from application memory into a file
|
|
|
|
|
*
|
|
|
|
|
* Return: Non-negative on success/Negative on failure
|
|
|
|
|
*
|
|
|
|
|
* Programmer: Quincey Koziol
|
|
|
|
|
* Tuesday, July 23, 2002
|
|
|
|
|
*
|
|
|
|
|
* Modifications:
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
herr_t
|
[svn-r5894] Purpose:
Bug fix/Code cleanup/New Feature
Description:
Correct problems with writing fill-values to external storage and allocate
the data storage at the correct times.
Also, mostly straighten out the strange code which allocates and fills
raw data storage for datasets. Things are still a bit odd in that the
fill-values for chunked datasets are written when the space is allocated,
instead of in a separate routine, but there are two reasons for this:
it's inefficient (especially in parallel) to iterate through all the chunks
twice, and (more importantly) the space needed to store compressed chunks
isn't known until we've got a buffer of compressed fill-values ready to
write to the chunk.
Additionally, add in the H5D_SPACE_ALLOC_INCR and H5D_SPACE_ALLOC_DEFAULT
setting for the "space time", which incorporate the previous behavior of
the space allocation for chunked datasets.
The default settings for the different types of dataset storage are now
as follows:
Contiguous - Late
Chunked - Incremental
Compact - Early
This checkin also incorporates a change to the behavior of external data
storage in two ways - fill-values are _never_ written to external storage
(under the assumption that writing fill-values is triggered by allocating
space in an HDF5 file, and since space is not allocated in the file, the
fill-values should not be written) and external data files are now created
if they don't exist when data is written to them. The fill-value will
probably need to be revisited at some time in the future, this just seemed
like the safer course currently.
I think I cleaned up some compiler errors also, before getting bogged down
in the fixes for the space allocation and fill-values.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/serial & parallel. Will be testing on IRIX64
6.5 (modi4) in serial & parallel shortly.
2002-08-27 08:41:32 -05:00
|
|
|
|
H5S_select_write(H5F_t *f, H5O_layout_t *layout, H5P_genplist_t *dc_plist,
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
const H5O_efl_t *efl, size_t elmt_size, const H5S_t *file_space,
|
|
|
|
|
const H5S_t *mem_space, hid_t dxpl_id, const void *_buf/*out*/)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
{
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
H5S_sel_iter_t mem_iter; /* Memory selection iteration info */
|
|
|
|
|
hbool_t mem_iter_init=0; /* Memory selection iteration info has been initialized */
|
|
|
|
|
H5S_sel_iter_t file_iter; /* File selection iteration info */
|
|
|
|
|
hbool_t file_iter_init=0; /* File selection iteration info has been initialized */
|
2002-08-08 12:52:17 -05:00
|
|
|
|
const uint8_t *buf=NULL; /* Local buffer pointer, for address arithmetic */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
hsize_t *mem_off=NULL; /* Array to store sequence offsets in memory */
|
|
|
|
|
hsize_t *file_off=NULL; /* Array to store sequence offsets in the file */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
ssize_t vector_size; /* Value for vector size */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t *mem_len=NULL; /* Array to store sequence lengths in memory */
|
|
|
|
|
size_t *file_len=NULL; /* Array to store sequence lengths in the file */
|
|
|
|
|
size_t maxbytes; /* Number of bytes in selection */
|
|
|
|
|
size_t mem_nseq; /* Number of sequences generated in the file */
|
|
|
|
|
size_t file_nseq; /* Number of sequences generated in memory */
|
|
|
|
|
size_t mem_nbytes; /* Number of bytes used in memory sequences */
|
|
|
|
|
size_t file_nbytes; /* Number of bytes used in file sequences */
|
|
|
|
|
size_t curr_mem_seq; /* Current memory sequence to operate on */
|
|
|
|
|
size_t curr_file_seq; /* Current file sequence to operate on */
|
|
|
|
|
size_t tmp_file_len; /* Temporary number of bytes in file sequence */
|
|
|
|
|
unsigned partial_file; /* Whether a partial file sequence was accessed */
|
2002-08-08 12:52:17 -05:00
|
|
|
|
size_t orig_file_len=0; /* Original file sequence length for partial file access */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
size_t orig_file_seq; /* Original file sequence to operate on */
|
|
|
|
|
size_t tot_file_seq; /* Number of file sequences to access */
|
|
|
|
|
herr_t ret_value=SUCCEED; /* Return value */
|
|
|
|
|
|
|
|
|
|
FUNC_ENTER_NOAPI(H5S_select_write, FAIL);
|
|
|
|
|
|
2002-11-01 13:39:20 -05:00
|
|
|
|
/* Check args */
|
|
|
|
|
assert(f);
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
assert(efl);
|
2002-11-01 13:39:20 -05:00
|
|
|
|
assert(_buf);
|
|
|
|
|
assert(TRUE==H5P_isa_class(dxpl_id,H5P_DATASET_XFER));
|
|
|
|
|
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
/* Get the hyperslab vector size */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((vector_size=H5S_get_vector_size(dxpl_id))<0)
|
|
|
|
|
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "unable to get I/O vector size");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Allocate the vector I/O arrays */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((mem_len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((mem_off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((file_len = H5FL_ARR_MALLOC(size_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O length vector array");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((file_off = H5FL_ARR_MALLOC(hsize_t,(size_t)vector_size))==NULL)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "can't allocate I/O offset vector array");
|
|
|
|
|
|
|
|
|
|
/* Initialize file iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*file_space->select.iter_init)(file_space, elmt_size, &file_iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
file_iter_init=1; /* File selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Initialize memory iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if ((*mem_space->select.iter_init)(mem_space, elmt_size, &mem_iter)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_DATASPACE, H5E_CANTINIT, FAIL, "unable to initialize selection iterator");
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
mem_iter_init=1; /* Memory selection iteration info has been initialized */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Get number of bytes in selection */
|
2003-01-17 15:34:14 -05:00
|
|
|
|
#ifndef NDEBUG
|
|
|
|
|
{
|
|
|
|
|
hsize_t tmp_maxbytes=(*file_space->select.get_npoints)(file_space)*elmt_size;
|
|
|
|
|
H5_ASSIGN_OVERFLOW(maxbytes,tmp_maxbytes,hsize_t,size_t);
|
|
|
|
|
}
|
|
|
|
|
#else /* NDEBUG */
|
|
|
|
|
maxbytes=(size_t)((*file_space->select.get_npoints)(file_space)*elmt_size);
|
|
|
|
|
#endif /* NDEBUG */
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
|
|
|
|
|
/* Initialize sequence counts */
|
|
|
|
|
curr_mem_seq=curr_file_seq=0;
|
|
|
|
|
mem_nseq=file_nseq=0;
|
|
|
|
|
|
|
|
|
|
/* Loop, until all bytes are processed */
|
|
|
|
|
while(maxbytes>0) {
|
|
|
|
|
/* Check if more file sequences are needed */
|
|
|
|
|
if(curr_file_seq>=file_nseq) {
|
|
|
|
|
/* Get sequences for file selection */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*file_space->select.get_seq_list)(file_space,H5S_GET_SEQ_LIST_SORTED,&file_iter,elmt_size,(size_t)vector_size,maxbytes,&file_nseq,&file_nbytes,file_off,file_len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Start at the beginning of the sequences again */
|
|
|
|
|
curr_file_seq=0;
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if more memory sequences are needed */
|
|
|
|
|
if(curr_mem_seq>=mem_nseq) {
|
|
|
|
|
/* Get sequences for memory selection */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if((*mem_space->select.get_seq_list)(mem_space,0,&mem_iter,elmt_size,(size_t)vector_size,maxbytes,&mem_nseq,&mem_nbytes,mem_off,mem_len)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR (H5E_INTERNAL, H5E_UNSUPPORTED, FAIL, "sequence length generation failed");
|
|
|
|
|
|
|
|
|
|
/* Start at the beginning of the sequences again */
|
|
|
|
|
curr_mem_seq=0;
|
|
|
|
|
|
|
|
|
|
/* Set the buffer pointer using the first sequence */
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[0],hsize_t,size_t);
|
|
|
|
|
buf=(const uint8_t *)_buf+(size_t)mem_off[0];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if current file sequence will fit into current memory sequence */
|
|
|
|
|
if(mem_len[curr_mem_seq]>=file_len[curr_file_seq]) {
|
|
|
|
|
/* Save the current number file sequence */
|
|
|
|
|
orig_file_seq=curr_file_seq;
|
|
|
|
|
|
|
|
|
|
/* Determine how many file sequences will fit into current memory sequence */
|
|
|
|
|
tmp_file_len=0;
|
|
|
|
|
tot_file_seq=0;
|
2002-09-13 11:27:09 -05:00
|
|
|
|
while( curr_file_seq<file_nseq && (tmp_file_len+file_len[curr_file_seq])<=mem_len[curr_mem_seq] ) {
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
tmp_file_len+=file_len[curr_file_seq];
|
|
|
|
|
curr_file_seq++;
|
|
|
|
|
tot_file_seq++;
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
/* Check for partial file sequence */
|
|
|
|
|
if(tmp_file_len<mem_len[curr_mem_seq] && curr_file_seq<file_nseq) {
|
|
|
|
|
/* Get the original file sequence length */
|
|
|
|
|
orig_file_len=file_len[curr_file_seq];
|
|
|
|
|
|
|
|
|
|
/* Make the last file sequence a partial access */
|
|
|
|
|
file_len[curr_file_seq]=mem_len[curr_mem_seq]-tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Increase the number of bytes to access */
|
|
|
|
|
tmp_file_len=mem_len[curr_mem_seq];
|
|
|
|
|
|
|
|
|
|
/* Indicate that there is an extra sequence to include in the file access */
|
|
|
|
|
tot_file_seq++;
|
|
|
|
|
|
|
|
|
|
/* Indicate a partial file sequence */
|
|
|
|
|
partial_file=1;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else
|
|
|
|
|
partial_file=0;
|
|
|
|
|
|
|
|
|
|
/* Write current memory sequence into file sequences */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_writev(f, dxpl_id, layout, dc_plist, efl, file_space, elmt_size, tot_file_seq, &file_len[orig_file_seq], &file_off[orig_file_seq], buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_WRITEERROR, FAIL, "write error");
|
|
|
|
|
|
|
|
|
|
/* Update last file sequence, if it was partially accessed */
|
|
|
|
|
if(partial_file) {
|
|
|
|
|
file_off[curr_file_seq]+=orig_file_len-file_len[curr_file_seq];
|
|
|
|
|
file_len[curr_file_seq]=orig_file_len-file_len[curr_file_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Check if the current memory sequence was only partially accessed */
|
|
|
|
|
if(tmp_file_len<mem_len[curr_mem_seq]) {
|
|
|
|
|
/* Adjust current memory sequence */
|
|
|
|
|
mem_off[curr_mem_seq]+=tmp_file_len;
|
|
|
|
|
mem_len[curr_mem_seq]-=tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Adjust memory buffer pointer */
|
|
|
|
|
buf+=tmp_file_len;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Must have used entire memory sequence, advance to next one */
|
|
|
|
|
curr_mem_seq++;
|
|
|
|
|
|
|
|
|
|
/* Check if it is valid to adjust buffer pointer */
|
|
|
|
|
if(curr_mem_seq<mem_nseq) {
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[curr_mem_seq],hsize_t,size_t);
|
|
|
|
|
buf=(const uint8_t *)_buf+(size_t)mem_off[curr_mem_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
} /* end else */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left to process */
|
|
|
|
|
maxbytes-=tmp_file_len;
|
|
|
|
|
} /* end if */
|
|
|
|
|
else {
|
|
|
|
|
/* Save number of bytes to access */
|
|
|
|
|
tmp_file_len=mem_len[curr_mem_seq];
|
|
|
|
|
|
|
|
|
|
/* Write part of current memory sequence to current file sequence */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if (H5F_seq_write(f, dxpl_id, layout, dc_plist, efl, file_space, elmt_size, tmp_file_len, file_off[curr_file_seq], buf)<0)
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
HGOTO_ERROR(H5E_DATASPACE, H5E_WRITEERROR, FAIL, "write error");
|
|
|
|
|
|
|
|
|
|
/* Update current file sequence information */
|
|
|
|
|
file_off[curr_file_seq]+=tmp_file_len;
|
|
|
|
|
file_len[curr_file_seq]-=tmp_file_len;
|
|
|
|
|
|
|
|
|
|
/* Increment memory sequence */
|
|
|
|
|
curr_mem_seq++;
|
|
|
|
|
|
|
|
|
|
/* Check if it is valid to adjust buffer pointer */
|
|
|
|
|
if(curr_mem_seq<mem_nseq) {
|
|
|
|
|
H5_CHECK_OVERFLOW(mem_off[curr_mem_seq],hsize_t,size_t);
|
|
|
|
|
buf=(const uint8_t *)_buf+(size_t)mem_off[curr_mem_seq];
|
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Decrement number of bytes left to process */
|
|
|
|
|
maxbytes-=tmp_file_len;
|
|
|
|
|
} /* end else */
|
|
|
|
|
} /* end while */
|
|
|
|
|
|
|
|
|
|
done:
|
|
|
|
|
/* Release file selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(file_iter_init) {
|
|
|
|
|
if ((*file_space->select.iter_release)(&file_iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Release memory selection iterator */
|
[svn-r6252] Purpose:
Lots of performance improvements & a couple new internal API interfaces.
Description:
Performance Improvements:
- Cached file offset & length sizes in shared file struct, to avoid
constantly looking them up in the FCPL.
- Generic property improvements:
- Added "revision" number to generic property classes to speed
up comparisons.
- Changed method of storing properties from using a hash-table
to the TBBT routines in the library.
- Share the propery names between classes and the lists derived
from them.
- Removed redundant 'def_value' buffer from each property.
- Switching code to use a "copy on write" strategy for
properties in each list, where the properties in each list
are shared with the properties in the class, until a
property's value is changed in a list.
- Fixed error in layout code which was allocating too many buffers.
- Redefined public macros of the form (H5open()/H5check, <variable>)
internally to only be (<variable>), avoiding innumerable useless
calls to H5open() and H5check_version().
- Reuse already zeroed buffers in H5F_contig_fill instead of
constantly re-zeroing them.
- Don't write fill values if writing entire dataset.
- Use gettimeofday() system call instead of time() system when
checking the modification time of a dataset.
- Added reference counted string API and use it for tracking the
names of objects opening in a file (for the ID->name code).
- Removed redundant H5P_get() calls in B-tree routines.
- Redefine H5T datatype macros internally to the library, to avoid
calling H5check redundantly.
- Keep dataspace information for dataset locally instead of reading
from disk each time. Added new module to track open objects
in a file, to allow this (which will be useful eventually for
some FPH5 metadata caching issues).
- Remove H5AC_find macro which was inlining metadata cache lookups,
and call function instead.
- Remove redundant memset() calls from H5G_namei() routine.
- Remove redundant checking of object type when locating objects
in metadata cache and rely on the address only.
- Create default dataset object to use when default dataset creation
property list is used to create datasets, bypassing querying
for all the property list values.
- Use default I/O vector size when performing raw data with the
default dataset transfer property list, instead of querying for
I/O vector size.
- Remove H5P_DEFAULT internally to the library, replacing it with
more specific default property list based on the type of
property list needed.
- Remove redundant memset() calls in object header message (H5O*)
routines.
- Remove redunant memset() calls in data I/O routines.
- Split free-list allocation routines into malloc() and calloc()-
like routines, instead of one combined routine.
- Remove lots of indirection in H5O*() routines.
- Simplify metadata cache entry comparison routine (used when
flushing entire cache out).
- Only enable metadata cache statistics when H5AC_DEBUG is turned
on, instead of always tracking them.
- Simplify address comparison macro (H5F_addr_eq).
- Remove redundant metadata cache entry protections during dataset
creation by protecting the object header once and making all
the modifications necessary for the dataset creation before
unprotecting it.
- Reduce # of "number of element in extent" computations performed
by computing and storing the value during dataspace creation.
- Simplify checking for group location's file information, when file
has not been involving in file-mounting operations.
- Use binary encoding for modification time, instead of ASCII.
- Hoist H5HL_peek calls (to get information in a local heap)
out of loops in many group routine.
- Use static variable for iterators of selections, instead of
dynamically allocation them each time.
- Lookup & insert new entries in one step, avoiding traversing
group's B-tree twice.
- Fixed memory leak in H5Gget_objname_idx() routine (tangential to
performance improvements, but fixed along the way).
- Use free-list for reference counted strings.
- Don't bother copying object names into cached group entries,
since they are re-created when an object is opened.
The benchmark I used to measure these results created several thousand
small (2K) datasets in a file and wrote out the data for them. This is
Elena's "regular.c" benchmark.
These changes resulted in approximately ~4.3x speedup of the
development branch when compared to the previous code in the
development branch and ~1.4x speedup compared to the release
branch.
Additionally, these changes reduce the total memory used (code and
data) by the development branch by ~800KB, bringing the development
branch back into the same ballpark as the release branch.
I'll send out a more detailed description of the benchmark results
as a followup note.
New internal API routines:
Added "reference counted strings" API for tracking strings that get
used by multiple owners without duplicating the strings.
Added "ternary search tree" API for text->object mappings.
Platforms tested:
Tested h5committest {arabica (fortran), eirene (fortran, C++)
modi4 (parallel, fortran)}
Other platforms/configurations tested?
FreeBSD 4.7 (sleipnir) serial & parallel
Solaris 2.6 (baldric) serial
2003-01-09 12:20:03 -05:00
|
|
|
|
if(mem_iter_init) {
|
|
|
|
|
if ((*mem_space->select.iter_release)(&mem_iter)<0)
|
|
|
|
|
HDONE_ERROR (H5E_DATASPACE, H5E_CANTRELEASE, FAIL, "unable to release selection iterator");
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end if */
|
|
|
|
|
|
|
|
|
|
/* Free vector arrays */
|
|
|
|
|
if(file_len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,file_len);
|
|
|
|
|
if(file_off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,file_off);
|
|
|
|
|
if(mem_len!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(size_t,mem_len);
|
|
|
|
|
if(mem_off!=NULL)
|
|
|
|
|
H5FL_ARR_FREE(hsize_t,mem_off);
|
2003-01-10 15:26:02 -05:00
|
|
|
|
FUNC_LEAVE_NOAPI(ret_value);
|
[svn-r5834] Purpose:
Large code cleanup/re-write
Description:
This is phase 1 of the data I/O re-architecture, with the following changes:
- Changed the selection drivers to not actually do any I/O, they
only generate the sequences of offset/length pairs needed for
the I/O (or memory access, in the case of iterating or filling
a selection in a memory buffer)
- Wrote more abstract I/O routines which get the sequence of offset/
length pairs for each selection and access perform the I/O or
memory access.
Benefits of this change include:
- Removed ~3400 lines of quite redundant code, with corresponding
reduction in the size of library binary.
- Any selection can now directly access memory when performing I/O,
if no type conversions are required, instead of just "regular"
hyperslab and 'all' selections, which speeds up I/O.
- Sped up I/O for hyperslab selections which have contiguous lower
dimensions by "flattening" them out into lesser dimensional objects
for the I/O.
No file format or API changes were necessary for this change.
The next phase will be to create a "selection driver" for each type of
selection, allowing each type of selection to directly call certain
methods that only apply to that type of selection, instead of passing
through dozens of functions which have switch statements to call the
appropriate method for each selection type. This will also reduce
the amount of code in the library and speed things up a bit more.
Phase 3 will involve generating an MPI datatype for all types of selections,
instead of only "regular" hyperslab and 'all' selections. This will
allow collective parallel I/O for all I/O operations which don't
require type conversions. It will also open up the door for allowing
collective I/O on datasets which require type conversion.
Phase 4 will involve changing the access pattern to deal with chunked
datasets in a more optimal way (in serial).
Phase 5 will deal with accessing chunked datasets more optimally for
collective parallel I/O operations.
Platforms tested:
FreeBSD 4.6 (sleipnir) w/ parallel & C++ and IRIX64 6.5 (modi4) w/parallel
2002-07-24 13:56:48 -05:00
|
|
|
|
} /* end H5S_select_write() */
|
2002-04-09 07:47:34 -05:00
|
|
|
|
|