[svn-r11470] Purpose:

Repair synchronization bug in the metadata cache in PHDF5

Also repair numerous other bugs that surfaced in testing the
bug fix.


Description:

While operations modifying metadata must be collective, we allow
independant reads.  This allows metadata caches on different processes
to adjust to different sizes, and to place the entries on their dirty
lists in different orders.  Since only process 0 actually writes
metadata to disk (all other processes thought they did, but the writes
were discarded on the theory that they had to be collective), this made
it possible for another process to modify metadata, flush it, and then
read it back in in its original form (pre-modification) form.  The
possibilities for file corruption should be obvious.


Solution:

Make the policy that only process 0 can write to file explicit, and
visible to the metadata caches.  Thus only process 0 may flush dirty
entries -- all other caches must retain dirty entries until they are
informed by process 0 that the entries are clean.

Synchronization is handled by counting the bytes of dirty cache entries
created, and then synching up between the caches whenever the sum
exceeds an (eventually user specified) limit.  Dirty metadata creation
is consistent across all processes because all operations modifying
metadata must be collective.

This change uncovered may bugs which are repaired in this checkin.
It also required modification of H5HL and H5O to allocate file space
on insertion rather than on flush from cache.


Platforms tested:

H5committest, heping(parallel & serial)

Misc. update:
This commit is contained in:
John Mainzer 2005-09-27 00:20:11 -05:00
parent f9fc749ca2
commit c100b0bf26
21 changed files with 3998 additions and 416 deletions

View File

@ -188,11 +188,11 @@ main (void)
/*
* Attach to the string attribute using its index, then read and display the value.
*/
attr = H5Aopen_idx(dataset, 2);
attr = H5Aopen_idx(dataset, 1);
atype = H5Tcopy(H5T_C_S1);
H5Tset_size(atype, 5);
ret = H5Aread(attr, atype, string_out);
printf("The value of the attribute with index 2 is %s \n", string_out);
printf("The value of the attribute with index 1 is %s \n", string_out);
ret = H5Aclose(attr);
ret = H5Tclose(atype);

2122
src/H5AC.c

File diff suppressed because it is too large Load Diff

View File

@ -182,7 +182,7 @@ extern hid_t H5AC_ind_dxpl_id;
/* int version = */ H5C__CURR_AUTO_SIZE_CTL_VER, \
/* hbool_t rpt_fcn_enabled = */ FALSE, \
/* hbool_t set_initial_size = */ TRUE, \
/* size_t initial_size = */ (1 * 1024 * 1024), \
/* size_t initial_size = */ ( 1 * 1024 * 1024), \
/* double min_clean_fraction = */ 0.25, \
/* size_t max_size = */ (16 * 1024 * 1024), \
/* size_t min_size = */ ( 1 * 1024 * 1024), \
@ -216,6 +216,7 @@ extern hid_t H5AC_ind_dxpl_id;
#define H5AC__SET_FLUSH_MARKER_FLAG H5C__SET_FLUSH_MARKER_FLAG
#define H5AC__DELETED_FLAG H5C__DELETED_FLAG
#define H5AC__DIRTIED_FLAG H5C__DIRTIED_FLAG
#define H5AC__SIZE_CHANGED_FLAG H5C__SIZE_CHANGED_FLAG
#define H5AC__FLUSH_INVALIDATE_FLAG H5C__FLUSH_INVALIDATE_FLAG
#define H5AC__FLUSH_CLEAR_ONLY_FLAG H5C__FLUSH_CLEAR_ONLY_FLAG
#define H5AC__FLUSH_MARKED_ENTRIES_FLAG H5C__FLUSH_MARKED_ENTRIES_FLAG
@ -235,6 +236,7 @@ H5_DLL herr_t H5AC_unprotect(H5F_t *f, hid_t dxpl_id,
H5_DLL herr_t H5AC_flush(H5F_t *f, hid_t dxpl_id, unsigned flags);
H5_DLL herr_t H5AC_rename(H5F_t *f, const H5AC_class_t *type,
haddr_t old_addr, haddr_t new_addr);
H5_DLL herr_t H5AC_dest(H5F_t *f, hid_t dxpl_id);
H5_DLL herr_t H5AC_stats(const H5F_t *f);

1053
src/H5C.c

File diff suppressed because it is too large Load Diff

View File

@ -79,9 +79,18 @@
*
* JRM - 7/19/04
*
* The TBBT has since been replaced with a skip list. This change
* greatly predates this note.
*
* JRM - 9/26/05
*
* magic: Unsigned 32 bit integer always set to H5C__H5C_T_MAGIC. This
* field is used to validate pointers to instances of H5C_t.
*
* aux_ptr: Pointer to void used to allow wrapper code to associate
* its data with an instance of H5C_t. The H5C cache code
* sets this field to NULL, and otherwise leaves it alone.
*
* max_type_id: Integer field containing the maximum type id number assigned
* to a type of entry in the cache. All type ids from 0 to
* max_type_id inclusive must be defined. The names of the
@ -110,6 +119,10 @@
* will attempt to reduce its size to the max_cache_size
* limit on the next cache write.
*
* c) When an entry increases in size, the cache may exceed
* the max_cache_size limit until the next time the cache
* attempts to load or insert an entry.
*
* min_clean_size: Nominal minimum number of clean bytes in the cache.
* The cache attempts to maintain this number of bytes of
* clean data so as to avoid case b) above. Again, this is
@ -126,7 +139,14 @@
* a write is permissible at any given point in time.
*
* If no such function is specified (i.e. this field is NULL),
* the cache will presume that writes are always permissable.
* the cache uses the following write_permitted field to
* determine whether writes are permitted.
*
* write_permitted: If check_write_permitted is NULL, this boolean flag
* indicates whether writes are permitted.
*
* log_flush: If provided, this function is called whenever a dirty
* entry is flushed to disk.
*
*
* The cache requires an index to facilitate searching for entries. The
@ -483,6 +503,16 @@
* id equal to the array index has been renamed in the current
* epoch.
*
* size_increases: Array of int64 of length H5C__MAX_NUM_TYPE_IDS + 1.
* The cells are used to record the number of times an entry
* with type id equal to the array index has increased in
* size in the current epoch.
*
* size_decreases: Array of int64 of length H5C__MAX_NUM_TYPE_IDS + 1.
* The cells are used to record the number of times an entry
* with type id equal to the array index has decreased in
* size in the current epoch.
*
* total_ht_insertions: Number of times entries have been inserted into the
* hash table in the current epoch.
*
@ -580,6 +610,8 @@ struct H5C_t
{
uint32_t magic;
void * aux_ptr;
int32_t max_type_id;
const char * (* type_name_table_ptr);
@ -587,6 +619,9 @@ struct H5C_t
size_t min_clean_size;
H5C_write_permitted_func_t check_write_permitted;
hbool_t write_permitted;
H5C_log_flush_func_t log_flush;
int32_t index_len;
size_t index_size;
@ -646,6 +681,8 @@ struct H5C_t
int64_t flushes[H5C__MAX_NUM_TYPE_IDS + 1];
int64_t evictions[H5C__MAX_NUM_TYPE_IDS + 1];
int64_t renames[H5C__MAX_NUM_TYPE_IDS + 1];
int64_t size_increases[H5C__MAX_NUM_TYPE_IDS + 1];
int64_t size_decreases[H5C__MAX_NUM_TYPE_IDS + 1];
int64_t total_ht_insertions;
int64_t total_ht_deletions;

View File

@ -36,6 +36,7 @@
#include "H5Fprivate.h" /* File access */
#define H5C_DO_SANITY_CHECKS 0
#define H5C_DO_EXTREME_SANITY_CHECKS 0
/* This sanity checking constant was picked out of the air. Increase
* or decrease it if appropriate. Its purposes is to detect corrupt
@ -149,6 +150,12 @@ typedef herr_t (*H5C_write_permitted_func_t)(const H5F_t *f,
hid_t dxpl_id,
hbool_t * write_permitted_ptr);
typedef herr_t (*H5C_log_flush_func_t)(H5C_t * cache_ptr,
haddr_t addr,
hbool_t was_dirty,
unsigned flags,
int type_id);
/* Upper and lower limits on cache size. These limits are picked
* out of a hat -- you should be able to change them as necessary.
*
@ -180,8 +187,9 @@ typedef herr_t (*H5C_write_permitted_func_t)(const H5F_t *f,
*
* In typical application, this structure is the first field in a
* structure to be cached. For historical reasons, the external module
* is responsible for managing the is_dirty field. All other fields are
* managed by the cache.
* is responsible for managing the is_dirty field (this is no longer
* completely true. See the comment on the is_dirty field for details).
* All other fields are managed by the cache.
*
* The fields of this structure are discussed individually below:
*
@ -220,6 +228,12 @@ typedef herr_t (*H5C_write_permitted_func_t)(const H5F_t *f,
* someday. However it will require a change in the
* cache interface.
*
* Update: Management of the is_dirty field has been largely
* moved into the cache. The only remaining exceptions
* are the flush and clear functions supplied by the
* modules using the cache. These still clear the
* is_dirty field as before. -- JRM 7/5/05
*
* is_protected: Boolean flag indicating whether this entry is protected
* (or locked, to use more conventional terms). When it is
* protected, the entry cannot be flushed or accessed until
@ -239,6 +253,18 @@ typedef herr_t (*H5C_write_permitted_func_t)(const H5F_t *f,
* H5AC__FLUSH_MARKED_ENTRIES_FLAG. The flag is reset when
* the entry is flushed for whatever reason.
*
* clear_on_unprotect: Boolean flag used only in PHDF5. When H5C is used
* to implement the metadata cache In the parallel case, only
* the cache with mpi rank 0 is allowed to actually write to
* file -- all other caches must retain dirty entries until they
* are advised that the entry is clean.
*
* This flag is used in the case that such an advisory is
* received when the entry is protected. If it is set when an
* entry is unprotected, and the dirtied flag is not set in
* the unprotect, the entry's is_dirty flag is reset by flushing
* it with the H5C__FLUSH_CLEAR_ONLY_FLAG.
*
*
* Fields supporting the hash table:
*
@ -344,6 +370,9 @@ typedef struct H5C_cache_entry_t
hbool_t is_protected;
hbool_t in_slist;
hbool_t flush_marker;
#ifdef H5_HAVE_PARALLEL
hbool_t clear_on_unprotect;
#endif /* H5_HAVE_PARALLEL */
/* fields supporting the hash table: */
@ -676,20 +705,24 @@ typedef struct H5C_auto_size_ctl_t
#define H5C__SET_FLUSH_MARKER_FLAG 0x0001
#define H5C__DELETED_FLAG 0x0002
/* This flag applies only to H5C_unprotect() */
/* These flags applies only to H5C_unprotect() */
#define H5C__DIRTIED_FLAG 0x0004
#define H5C__SIZE_CHANGED_FLAG 0x0008
/* These flags apply to H5C_flush() & H5C_flush_single_entry() */
#define H5C__FLUSH_INVALIDATE_FLAG 0x0008
#define H5C__FLUSH_CLEAR_ONLY_FLAG 0x0010
#define H5C__FLUSH_MARKED_ENTRIES_FLAG 0x0020
#define H5C__FLUSH_INVALIDATE_FLAG 0x0010
#define H5C__FLUSH_CLEAR_ONLY_FLAG 0x0020
#define H5C__FLUSH_MARKED_ENTRIES_FLAG 0x0040
H5_DLL H5C_t * H5C_create(size_t max_cache_size,
size_t min_clean_size,
int max_type_id,
const char * (* type_name_table_ptr),
H5C_write_permitted_func_t check_write_permitted);
const char * (* type_name_table_ptr),
H5C_write_permitted_func_t check_write_permitted,
hbool_t write_permitted,
H5C_log_flush_func_t log_flush,
void * aux_ptr);
H5_DLL void H5C_def_auto_resize_rpt_fcn(H5C_t * cache_ptr,
int32_t version,
@ -713,6 +746,11 @@ H5_DLL herr_t H5C_flush_cache(H5F_t * f,
H5C_t * cache_ptr,
unsigned flags);
H5_DLL herr_t H5C_flush_to_min_clean(H5F_t * f,
hid_t primary_dxpl_id,
hid_t secondary_dxpl_id,
H5C_t * cache_ptr);
H5_DLL herr_t H5C_get_cache_auto_resize_config(H5C_t * cache_ptr,
H5C_auto_size_ctl_t *config_ptr);
@ -725,6 +763,13 @@ H5_DLL herr_t H5C_get_cache_size(H5C_t * cache_ptr,
H5_DLL herr_t H5C_get_cache_hit_rate(H5C_t * cache_ptr,
double * hit_rate_ptr);
H5_DLL herr_t H5C_get_entry_status(H5C_t * cache_ptr,
haddr_t addr,
size_t * size_ptr,
hbool_t * in_cache_ptr,
hbool_t * is_dirty_ptr,
hbool_t * is_protected_ptr);
H5_DLL herr_t H5C_insert_entry(H5F_t * f,
hid_t primary_dxpl_id,
hid_t secondary_dxpl_id,
@ -734,6 +779,13 @@ H5_DLL herr_t H5C_insert_entry(H5F_t * f,
void * thing,
unsigned int flags);
H5_DLL herr_t H5C_mark_entries_as_clean(H5F_t * f,
hid_t primary_dxpl_id,
hid_t secondary_dxpl_id,
H5C_t * cache_ptr,
int32_t ce_array_len,
haddr_t * ce_array_ptr);
H5_DLL herr_t H5C_rename_entry(H5C_t * cache_ptr,
const H5C_class_t * type,
haddr_t old_addr,
@ -770,7 +822,8 @@ H5_DLL herr_t H5C_unprotect(H5F_t * f,
const H5C_class_t * type,
haddr_t addr,
void * thing,
unsigned flags);
unsigned int flags,
size_t new_size);
H5_DLL herr_t H5C_validate_resize_config(H5C_auto_size_ctl_t * config_ptr,
unsigned int tests);

View File

@ -2034,7 +2034,7 @@ H5D_update_entry_info(H5F_t *file, hid_t dxpl_id, H5D_t *dset, H5P_genplist_t *p
#endif /* H5O_ENABLE_BOGUS */
/* Add a modification time message. */
if (H5O_touch_oh(file, oh, TRUE, &oh_flags) < 0)
if (H5O_touch_oh(file, dxpl_id, oh, TRUE, &oh_flags) < 0)
HGOTO_ERROR(H5E_DATASET, H5E_CANTINIT, FAIL, "unable to update modification time message")
done:

View File

@ -67,7 +67,8 @@ static int H5F_flush_all_cb(void *f, hid_t fid, void *_invalidate);
static unsigned H5F_get_objects(const H5F_t *f, unsigned types, int max_objs, hid_t *obj_id_list);
static int H5F_get_objects_cb(void *obj_ptr, hid_t obj_id, void *key);
static herr_t H5F_get_vfd_handle(const H5F_t *file, hid_t fapl, void** file_handle);
static H5F_t *H5F_new(H5F_file_t *shared, hid_t fcpl_id, hid_t fapl_id);
static H5F_t *H5F_new(H5F_file_t *shared, hid_t fcpl_id, hid_t fapl_id,
H5FD_t *lf);
static herr_t H5F_dest(H5F_t *f, hid_t dxpl_id);
static herr_t H5F_flush(H5F_t *f, hid_t dxpl_id, H5F_scope_t scope, unsigned flags);
static herr_t H5F_close(H5F_t *f);
@ -1426,10 +1427,16 @@ done:
* Updated for the new metadata cache, and associated
* property list changes.
*
* J Mainzer, Jun 30, 2005
* Added lf parameter so the shared->lf field can be
* initialized prior to the call to H5AC_create() if a
* new instance of H5F_file_t is created. lf should be
* NULL if shared isn't, and vise versa.
*
*-------------------------------------------------------------------------
*/
static H5F_t *
H5F_new(H5F_file_t *shared, hid_t fcpl_id, hid_t fapl_id)
H5F_new(H5F_file_t *shared, hid_t fcpl_id, hid_t fapl_id, H5FD_t *lf)
{
H5F_t *f=NULL, *ret_value;
H5P_genplist_t *plist; /* Property list */
@ -1441,14 +1448,17 @@ H5F_new(H5F_file_t *shared, hid_t fcpl_id, hid_t fapl_id)
f->file_id = -1;
if (shared) {
HDassert( lf == NULL );
f->shared = shared;
} else {
HDassert( lf != NULL );
f->shared = H5FL_CALLOC(H5F_file_t);
f->shared->super_addr = HADDR_UNDEF;
f->shared->base_addr = HADDR_UNDEF;
f->shared->freespace_addr = HADDR_UNDEF;
f->shared->driver_addr = HADDR_UNDEF;
f->shared->lf = lf;
/*
* Copy the file creation and file access property lists into the
* new file handle. We do this early because some values might need
@ -1803,10 +1813,10 @@ H5F_open(const char *name, unsigned flags, hid_t fcpl_id, hid_t fapl_id, hid_t d
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "file is already open for read-only")
/* Allocate new "high-level" file struct */
if ((file = H5F_new(shared, fcpl_id, fapl_id)) == NULL)
if ((file = H5F_new(shared, fcpl_id, fapl_id, NULL)) == NULL)
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to create new file object")
lf = file->shared->lf;
lf = file->shared->lf;
} else if (flags!=tent_flags) {
/*
* This file is not yet open by the library and the flags we used to
@ -1821,20 +1831,18 @@ H5F_open(const char *name, unsigned flags, hid_t fcpl_id, hid_t fapl_id, hid_t d
file = NULL; /*to prevent destruction of wrong file*/
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to open file")
}
if (NULL==(file = H5F_new(NULL, fcpl_id, fapl_id)))
if (NULL==(file = H5F_new(NULL, fcpl_id, fapl_id, lf)))
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to create new file object")
file->shared->flags = flags;
file->shared->lf = lf;
} else {
/*
* This file is not yet open by the library and our tentative opening
* above is good enough.
*/
if (NULL==(file = H5F_new(NULL, fcpl_id, fapl_id)))
if (NULL==(file = H5F_new(NULL, fcpl_id, fapl_id, lf)))
HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "unable to create new file object")
file->shared->flags = flags;
file->shared->lf = lf;
}
/* Short cuts */
@ -2841,7 +2849,7 @@ H5Freopen(hid_t file_id)
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not a file")
/* Get a new "top level" file struct, sharing the same "low level" file struct */
if (NULL==(new_file=H5F_new(old_file->shared, H5P_FILE_CREATE_DEFAULT, H5P_FILE_ACCESS_DEFAULT)))
if (NULL==(new_file=H5F_new(old_file->shared, H5P_FILE_CREATE_DEFAULT, H5P_FILE_ACCESS_DEFAULT, NULL)))
HGOTO_ERROR(H5E_FILE, H5E_CANTINIT, FAIL, "unable to reopen file")
/* Keep old file's read/write intent in new file */
@ -3389,6 +3397,40 @@ H5F_mpi_get_comm(const H5F_t *f)
done:
FUNC_LEAVE_NOAPI(ret_value)
} /* end H5F_mpi_get_comm() */
/*-------------------------------------------------------------------------
* Function: H5F_mpi_get_size
*
* Purpose: Retrieves the size of an MPI process.
*
* Return: Success: The size (positive)
*
* Failure: Negative
*
* Programmer: John Mainzer
* Friday, May 6, 2005
*
* Modifications:
*
*-------------------------------------------------------------------------
*/
int
H5F_mpi_get_size(const H5F_t *f)
{
int ret_value;
FUNC_ENTER_NOAPI(H5F_mpi_get_size, FAIL)
assert(f && f->shared);
/* Dispatch to driver */
if ((ret_value=H5FD_mpi_get_size(f->shared->lf))<0)
HGOTO_ERROR(H5E_VFL, H5E_CANTGET, FAIL, "driver get_size request failed")
done:
FUNC_LEAVE_NOAPI(ret_value)
} /* end H5F_mpi_get_size() */
#endif /* H5_HAVE_PARALLEL */

View File

@ -2328,7 +2328,7 @@ H5FD_free(H5FD_t *file, H5FD_mem_t type, hid_t dxpl_id, haddr_t addr, hsize_t si
H5_ASSIGN_OVERFLOW(overlap_size,(file->accum_loc+file->accum_size)-addr,haddr_t,size_t);
/* Block to free is in the middle of the accumulator */
if(H5F_addr_lt(addr,file->accum_loc+file->accum_size)) {
if(H5F_addr_lt((addr + size), file->accum_loc + file->accum_size)) {
haddr_t tail_addr;
size_t tail_size;

View File

@ -934,6 +934,13 @@ done:
*
* Modifications:
*
* John Mainzer -- 9/21/05
* Modified code to turn off the
* H5FD_FEAT_ACCUMULATE_METADATA_WRITE flag.
* With the movement of
* all cache writes to process 0, this flag has become
* problematic in PHDF5.
*
*-------------------------------------------------------------------------
*/
static herr_t
@ -947,15 +954,6 @@ H5FD_mpio_query(const H5FD_t UNUSED *_file, unsigned long *flags /* out */)
if(flags) {
*flags=0;
*flags|=H5FD_FEAT_AGGREGATE_METADATA; /* OK to aggregate metadata allocations */
/* Distinguish between updating the metadata accumulator on writes and
* reads. This is particularly (perhaps only, even) important for MPI-I/O
* where we guarantee that writes are collective, but reads may not be.
* If we were to allow the metadata accumulator to be written during a
* read operation, the application would hang.
*/
*flags|=H5FD_FEAT_ACCUMULATE_METADATA_WRITE; /* OK to accumulate metadata for faster writes */
*flags|=H5FD_FEAT_AGGREGATE_SMALLDATA; /* OK to aggregate "small" raw data allocations */
} /* end if */
@ -1553,9 +1551,18 @@ H5FD_mpio_write(H5FD_t *_file, H5FD_mem_t type, hid_t dxpl_id, haddr_t addr,
if(H5P_get(plist,H5AC_BLOCK_BEFORE_META_WRITE_NAME,&block_before_meta_write)<0)
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "can't get H5AC property")
#if 0 /* JRM */
/* The metadata cache now only writes from process 0, which makes
* this synchronization incorrect. I'm leaving this code commented
* out instead of deleting it to remind us that we should re-write
* this function so that a metadata write from any other process
* should flag an error.
* -- JRM 9/1/05
*/
if(block_before_meta_write)
if (MPI_SUCCESS!= (mpi_code=MPI_Barrier(file->comm)))
HMPI_GOTO_ERROR(FAIL, "MPI_Barrier failed", mpi_code)
#endif /* JRM */
/* Only one process will do the actual write if all procs in comm write same metadata */
if (file->mpi_rank != H5_PAR_META_WRITE) {
@ -1616,11 +1623,22 @@ H5FD_mpio_write(H5FD_t *_file, H5FD_mem_t type, hid_t dxpl_id, haddr_t addr,
file->eof = HADDR_UNDEF;
done:
/* if only one process writes, need to broadcast the ret_value to other processes */
#if 0 /* JRM */
/* Since metadata writes are now done by process 0 only, this broadcast
* is no longer needed. I leave it in and commented out to remind us
* that we need to re-work this function to reflect this reallity.
*
* -- JRM 9/1/05
*/
/* if only one process writes, need to broadcast the ret_value to
* other processes
*/
if (type!=H5FD_MEM_DRAW) {
if (MPI_SUCCESS != (mpi_code=MPI_Bcast(&ret_value, sizeof(ret_value), MPI_BYTE, H5_PAR_META_WRITE, file->comm)))
HMPI_DONE_ERROR(FAIL, "MPI_Bcast failed", mpi_code)
} /* end if */
#endif /* JRM */
#ifdef H5FDmpio_DEBUG
if (H5FD_mpio_Debug[(int)'t'])

View File

@ -910,6 +910,12 @@ done:
*
* Modifications:
*
* John Mainzer -- 9/21/05
* Modified code to turn off the
* H5FD_FEAT_ACCUMULATE_METADATA_WRITE flag.
* With the movement of all cache writes to process 0,
* this flag has become problematic in PHDF5.
*
*-------------------------------------------------------------------------
*/
static herr_t
@ -923,15 +929,6 @@ H5FD_mpiposix_query(const H5FD_t UNUSED *_file, unsigned long *flags /* out */)
if(flags) {
*flags=0;
*flags|=H5FD_FEAT_AGGREGATE_METADATA; /* OK to aggregate metadata allocations */
/* Distinguish between updating the metadata accumulator on writes and
* reads. This is particularly (perhaps only, even) important for MPI-I/O
* where we guarantee that writes are collective, but reads may not be.
* If we were to allow the metadata accumulator to be written during a
* read operation, the application would hang.
*/
*flags|=H5FD_FEAT_ACCUMULATE_METADATA_WRITE; /* OK to accumulate metadata for faster writes */
*flags|=H5FD_FEAT_AGGREGATE_SMALLDATA; /* OK to aggregate "small" raw data allocations */
} /* end if */
@ -1235,6 +1232,14 @@ H5FD_mpiposix_write(H5FD_t *_file, H5FD_mem_t type, hid_t dxpl_id, haddr_t addr,
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, FAIL, "not a file access property list")
/* Metadata specific actions */
/* All metadata is now written from process 0 -- thus this function
* needs to be re-written to reflect this. For now I have simply
* commented out the code that attempts to synchronize metadata
* writes between processes, but we should really just flag an error
* whenever any process other than process 0 attempts to write
* metadata.
* -- JRM 9/1/05
*/
if(type!=H5FD_MEM_DRAW) {
unsigned block_before_meta_write=0; /* Whether to block before a metadata write */
@ -1252,9 +1257,11 @@ H5FD_mpiposix_write(H5FD_t *_file, H5FD_mem_t type, hid_t dxpl_id, haddr_t addr,
if(H5P_get(plist,H5AC_BLOCK_BEFORE_META_WRITE_NAME,&block_before_meta_write)<0)
HGOTO_ERROR(H5E_PLIST, H5E_CANTGET, FAIL, "can't get H5AC property")
#if 0 /* JRM */
if(block_before_meta_write)
if (MPI_SUCCESS!= (mpi_code=MPI_Barrier(file->comm)))
HMPI_GOTO_ERROR(FAIL, "MPI_Barrier failed", mpi_code)
#endif /* JRM */
/* Only one process will do the actual write if all procs in comm write same metadata */
if (file->mpi_rank != H5_PAR_META_WRITE)
@ -1328,6 +1335,14 @@ done:
file->pos = HADDR_UNDEF;
file->op = OP_UNKNOWN;
} /* end if */
#if 0 /* JRM */
/* Since metadata writes are now done by process 0 only, this broadcast
* is no longer needed. I leave it in and commented out to remind us
* that we need to re-work this function to reflect this reallity.
*
* -- JRM 9/1/05
*/
/* Guard against getting into metadata broadcast in failure cases */
else {
/* when only one process writes, need to broadcast the ret_value to other processes */
@ -1336,6 +1351,7 @@ done:
HMPI_GOTO_ERROR(FAIL, "MPI_Bcast failed", mpi_code)
} /* end if */
} /* end else */
#endif /* JRM */
FUNC_LEAVE_NOAPI(ret_value)
} /* end H5FD_mpiposix_write() */

View File

@ -1628,7 +1628,14 @@ H5FD_multi_alloc(H5FD_t *_file, H5FD_mem_t type, hid_t dxpl_id, hsize_t size)
if (HADDR_UNDEF==(addr=H5FDalloc(file->memb[mmt], type, dxpl_id, size)))
H5Epush_ret(func, H5E_ERR_CLS, H5E_INTERNAL, H5E_BADVALUE, "member file can't alloc", HADDR_UNDEF)
addr += file->fa.memb_addr[mmt];
if (addr+size>file->eoa) file->eoa = addr+size;
if ( addr + size > file->eoa ) {
if ( H5FD_multi_set_eoa(_file, addr + size) < 0 ) {
H5Epush_ret(func, H5E_ERR_CLS, H5E_INTERNAL, H5E_BADVALUE, \
"can't set eoa", HADDR_UNDEF)
}
}
return addr;
}

View File

@ -448,6 +448,7 @@ H5_DLL haddr_t H5F_get_eoa(const H5F_t *f);
#ifdef H5_HAVE_PARALLEL
H5_DLL int H5F_mpi_get_rank(const H5F_t *f);
H5_DLL MPI_Comm H5F_mpi_get_comm(const H5F_t *f);
H5_DLL int H5F_mpi_get_size(const H5F_t *f);
#endif /* H5_HAVE_PARALLEL */
/* Functions than check file mounting information */

View File

@ -155,7 +155,6 @@ H5HL_create(H5F_t *f, hid_t dxpl_id, size_t size_hint, haddr_t *addr_p/*out*/)
heap->addr = *addr_p + (hsize_t)sizeof_hdr;
heap->disk_alloc = size_hint;
heap->mem_alloc = size_hint;
heap->disk_resrv = 0;
if (NULL==(heap->chunk = H5FL_BLK_CALLOC(heap_chunk,(sizeof_hdr + size_hint))))
HGOTO_ERROR (H5E_RESOURCE, H5E_NOSPACE, FAIL, "memory allocation failed");
@ -320,6 +319,20 @@ done:
*
* Modifications:
*
* John Mainzer, 8/10/05
* Reworked this function for a different role.
*
* It used to be called during cache eviction, where it
* attempted to size the disk space allocation for the
* actuall size of the heap. However, this causes problems
* in the parallel case, as the reuslting disk allocations
* may not be synchronized.
*
* It is now called from H5HL_remove(), where it is used to
* reduce heap size in response to an entry deletion. This
* means that the function should either do nothing, or
* reduce the size of the disk allocation.
*
*-------------------------------------------------------------------------
*/
static herr_t
@ -331,18 +344,12 @@ H5HL_minimize_heap_space(H5F_t *f, hid_t dxpl_id, H5HL_t *heap)
FUNC_ENTER_NOAPI(H5HL_minimize_heap_space, FAIL)
/* check args */
assert(f);
assert(heap);
HDassert( f );
HDassert( heap );
HDassert( heap->disk_alloc == heap->mem_alloc );
sizeof_hdr = H5HL_SIZEOF_HDR(f); /* cache H5HL header size for file */
/*
* When the heap is being flushed to disk, release the file space reserved
* for it.
*/
H5MF_free_reserved(f, (hsize_t)heap->disk_resrv);
heap->disk_resrv = 0;
/*
* Check to see if we can reduce the size of the heap in memory by
* eliminating free blocks at the tail of the buffer before flushing the
@ -427,13 +434,15 @@ H5HL_minimize_heap_space(H5F_t *f, hid_t dxpl_id, H5HL_t *heap)
}
/*
* If the heap grew larger or smaller than disk storage then move the
* If the heap grew smaller than disk storage then move the
* data segment of the heap to another contiguous block of disk
* storage.
*/
if (heap->mem_alloc != heap->disk_alloc) {
haddr_t old_addr = heap->addr, new_addr;
HDassert( heap->mem_alloc < heap->disk_alloc );
/* Release old space on disk */
H5_CHECK_OVERFLOW(heap->disk_alloc, size_t, hsize_t);
H5MF_xfree(f, H5FD_MEM_LHEAP, dxpl_id, old_addr, (hsize_t)heap->disk_alloc);
@ -453,7 +462,8 @@ H5HL_minimize_heap_space(H5F_t *f, hid_t dxpl_id, H5HL_t *heap)
done:
FUNC_LEAVE_NOAPI(ret_value)
}
} /* H5HL_minimize_heap_space() */
/*-------------------------------------------------------------------------
@ -543,6 +553,13 @@ H5HL_serialize(H5F_t *f, H5HL_t *heap, uint8_t *buf)
*
* Bill Wendling, 2003-09-16
* Separated out the bit that serializes the heap.
*
* John Mainzer, 2005-08-10
* Removed call to H5HL_minimize_heap_space(). It does disk space
* allocation, which can cause problems if done at flush time.
* Instead, disk space allocation/deallocation is now done at
* insert/remove time.
*
*-------------------------------------------------------------------------
*/
static herr_t
@ -553,21 +570,18 @@ H5HL_flush(H5F_t *f, hid_t dxpl_id, hbool_t destroy, haddr_t addr, H5HL_t *heap)
FUNC_ENTER_NOAPI(H5HL_flush, FAIL);
/* check arguments */
assert(f);
assert(H5F_addr_defined(addr));
assert(heap);
HDassert( f );
HDassert( H5F_addr_defined(addr) );
HDassert( heap );
HDassert( heap->disk_alloc == heap->mem_alloc );
if (heap->cache_info.is_dirty) {
haddr_t hdr_end_addr;
size_t sizeof_hdr = H5HL_SIZEOF_HDR(f); /* cache H5HL header size for file */
/* Minimize the heap space size if possible */
if (H5HL_minimize_heap_space(f, dxpl_id, heap) < 0)
HGOTO_ERROR(H5E_HEAP, H5E_CANTFREE, FAIL, "unable to minimize local heap space")
/* Write the header */
if (H5HL_serialize(f, heap, heap->chunk) < 0)
HGOTO_ERROR(H5E_BTREE, H5E_CANTSERIALIZE, FAIL, "unable to serialize local heap")
HGOTO_ERROR(H5E_HEAP, H5E_CANTSERIALIZE, FAIL, "unable to serialize local heap")
/* Copy buffer to disk */
hdr_end_addr = addr + (hsize_t)sizeof_hdr;
@ -951,6 +965,10 @@ H5HL_remove_free(H5HL_t *heap, H5HL_free_t *fl)
* H5AC_unprotect() instead of manipulating the is_dirty
* field of the cache info directly.
*
* John Mainzer, 8/10/05
* Modified code to allocate file space as needed, instead
* of allocating it on eviction.
*
*-------------------------------------------------------------------------
*/
size_t
@ -959,10 +977,12 @@ H5HL_insert(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t buf_size, const void *
H5HL_t *heap = NULL;
unsigned heap_flags = H5AC__NO_FLAGS_SET;
H5HL_free_t *fl = NULL, *max_fl = NULL;
htri_t tri_result;
herr_t result;
size_t offset = 0;
size_t need_size, old_size, need_more;
size_t new_disk_alloc;
hbool_t found;
size_t disk_resrv; /* Amount of additional space to reserve in file */
size_t sizeof_hdr; /* Cache H5HL header size for file */
size_t ret_value; /* Return value */
@ -1028,18 +1048,54 @@ H5HL_insert(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t buf_size, const void *
if (found==FALSE) {
need_more = MAX3(need_size, heap->mem_alloc, H5HL_SIZEOF_FREE(f));
/* Reserve space in the file to hold the increased heap size
*/
if( heap->disk_resrv == heap->mem_alloc)
disk_resrv = need_more;
else
disk_resrv = heap->mem_alloc + need_more - heap->disk_resrv;
new_disk_alloc = heap->disk_alloc + need_more;
HDassert( heap->disk_alloc < new_disk_alloc );
H5_CHECK_OVERFLOW(heap->disk_alloc, size_t, hsize_t);
H5_CHECK_OVERFLOW(new_disk_alloc, size_t, hsize_t);
if( H5MF_reserve(f, (hsize_t)disk_resrv) < 0 )
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, (size_t)(-1), "unable to reserve space in file");
/* extend the current heap if we can... */
tri_result = H5MF_can_extend(f, H5FD_MEM_LHEAP, heap->addr,
(hsize_t)(heap->disk_alloc),
(hsize_t)need_more);
if ( tri_result == TRUE ) {
/* Update heap's record of how much space it has reserved */
heap->disk_resrv += disk_resrv;
result = H5MF_extend(f, H5FD_MEM_LHEAP, heap->addr,
(hsize_t)(heap->disk_alloc),
(hsize_t)need_more);
if ( result < 0 ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, (size_t)(-1), \
"can't extend heap on disk");
}
heap->disk_alloc = new_disk_alloc;
} else { /* ...if we can't, allocate a new chunk & release the old */
haddr_t old_addr = heap->addr;
haddr_t new_addr;
/* The new allocation may fail -- to avoid the possiblity of
* file corruption, allocate the new heap first, and then
* deallocate the old.
*/
/* allocate new disk space for the heap */
if ( (new_addr = H5MF_alloc(f, H5FD_MEM_LHEAP, dxpl_id,
(hsize_t)new_disk_alloc)) == HADDR_UNDEF ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, (size_t)(-1), \
"unable to allocate file space for heap")
}
/* Release old space on disk */
H5MF_xfree(f, H5FD_MEM_LHEAP, dxpl_id, old_addr,
(hsize_t)heap->disk_alloc);
H5E_clear_stack(NULL); /* don't really care if the free failed */
heap->addr = new_addr;
heap->disk_alloc = new_disk_alloc;
}
if (max_fl && max_fl->offset + max_fl->size == heap->mem_alloc) {
/*
@ -1117,7 +1173,8 @@ done:
HDONE_ERROR(H5E_HEAP, H5E_PROTECT, (size_t)(-1), "unable to release object header");
FUNC_LEAVE_NOAPI(ret_value);
}
} /* H5HL_insert() */
#ifdef NOT_YET
@ -1217,6 +1274,11 @@ done:
* H5AC_unprotect() instead of manipulating the is_dirty
* field of the cache info directly.
*
* John Mainzer, 8/10/05
* Modified code to attempt to decrease heap size if the
* entry removal results in a free list entry at the end
* of the heap that is at least half the size of the heap.
*
*-------------------------------------------------------------------------
*/
herr_t
@ -1225,15 +1287,15 @@ H5HL_remove(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t offset, size_t size)
H5HL_t *heap = NULL;
unsigned heap_flags = H5AC__NO_FLAGS_SET;
H5HL_free_t *fl = NULL, *fl2 = NULL;
herr_t ret_value=SUCCEED; /* Return value */
herr_t ret_value = SUCCEED; /* Return value */
FUNC_ENTER_NOAPI(H5HL_remove, FAIL);
/* check arguments */
assert(f);
assert(H5F_addr_defined(addr));
assert(size > 0);
assert (offset==H5HL_ALIGN (offset));
HDassert( f );
HDassert( H5F_addr_defined(addr) );
HDassert( size > 0 );
HDassert( offset == H5HL_ALIGN(offset) );
if (0==(f->intent & H5F_ACC_RDWR))
HGOTO_ERROR (H5E_HEAP, H5E_WRITEERROR, FAIL, "no write intent on file");
@ -1243,8 +1305,9 @@ H5HL_remove(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t offset, size_t size)
if (NULL == (heap = H5AC_protect(f, dxpl_id, H5AC_LHEAP, addr, NULL, NULL, H5AC_WRITE)))
HGOTO_ERROR(H5E_HEAP, H5E_PROTECT, FAIL, "unable to load heap");
assert(offset < heap->mem_alloc);
assert(offset + size <= heap->mem_alloc);
HDassert( offset < heap->mem_alloc );
HDassert( offset + size <= heap->mem_alloc );
HDassert( heap->disk_alloc == heap->mem_alloc );
fl = heap->freelist;
heap_flags |= H5AC__DIRTIED_FLAG;
@ -1268,10 +1331,29 @@ H5HL_remove(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t offset, size_t size)
assert (fl->offset==H5HL_ALIGN (fl->offset));
assert (fl->size==H5HL_ALIGN (fl->size));
fl2 = H5HL_remove_free(heap, fl2);
if ( ( (fl->offset + fl->size) == heap->mem_alloc ) &&
( (2 * fl->size) > heap->mem_alloc ) ) {
if ( H5HL_minimize_heap_space(f, dxpl_id, heap) !=
SUCCEED ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTFREE, FAIL, \
"heap size minimization failed");
}
}
HGOTO_DONE(SUCCEED);
}
fl2 = fl2->next;
}
if ( ( (fl->offset + fl->size) == heap->mem_alloc ) &&
( (2 * fl->size) > heap->mem_alloc ) ) {
if ( H5HL_minimize_heap_space(f, dxpl_id, heap) != SUCCEED ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTFREE, FAIL, \
"heap size minimization failed");
}
}
HGOTO_DONE(SUCCEED);
} else if (fl->offset + fl->size == offset) {
@ -1283,10 +1365,29 @@ H5HL_remove(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t offset, size_t size)
fl->size += fl2->size;
assert (fl->size==H5HL_ALIGN (fl->size));
fl2 = H5HL_remove_free(heap, fl2);
if ( ( (fl->offset + fl->size) == heap->mem_alloc ) &&
( (2 * fl->size) > heap->mem_alloc ) ) {
if ( H5HL_minimize_heap_space(f, dxpl_id, heap) !=
SUCCEED ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTFREE, FAIL, \
"heap size minimization failed");
}
}
HGOTO_DONE(SUCCEED);
}
fl2 = fl2->next;
}
if ( ( (fl->offset + fl->size) == heap->mem_alloc ) &&
( (2 * fl->size) > heap->mem_alloc ) ) {
if ( H5HL_minimize_heap_space(f, dxpl_id, heap) != SUCCEED ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTFREE, FAIL, \
"heap size minimization failed");
}
}
HGOTO_DONE(SUCCEED);
}
fl = fl->next;
@ -1321,6 +1422,16 @@ H5HL_remove(H5F_t *f, hid_t dxpl_id, haddr_t addr, size_t offset, size_t size)
heap->freelist->prev = fl;
heap->freelist = fl;
if ( ( (fl->offset + fl->size) == heap->mem_alloc ) &&
( (2 * fl->size) > heap->mem_alloc ) ) {
if ( H5HL_minimize_heap_space(f, dxpl_id, heap) != SUCCEED ) {
HGOTO_ERROR(H5E_RESOURCE, H5E_CANTFREE, FAIL, \
"heap size minimization failed");
}
}
done:
if (heap && H5AC_unprotect(f, dxpl_id, H5AC_LHEAP, addr, heap, heap_flags) != SUCCEED)
HDONE_ERROR(H5E_HEAP, H5E_PROTECT, FAIL, "unable to release object header");

View File

@ -67,7 +67,6 @@ struct H5HL_t {
haddr_t addr; /*address of data */
size_t disk_alloc; /*data bytes allocated on disk */
size_t mem_alloc; /*data bytes allocated in mem */
size_t disk_resrv; /*data bytes "reserved" on disk */
uint8_t *chunk; /*the chunk, including header */
H5HL_free_t *freelist; /*the free list */
};

View File

@ -399,9 +399,9 @@ done:
*
* Purpose: Extend a block in the file.
*
* Return: Success: TRUE(1)/FALSE(0)
* Return: Success: Non-negative
*
* Failure: FAIL
* Failure: Negative
*
* Programmer: Quincey Koziol
* Saturday, June 12, 2004
@ -410,10 +410,10 @@ done:
*
*-------------------------------------------------------------------------
*/
htri_t
herr_t
H5MF_extend(H5F_t *f, H5FD_mem_t type, haddr_t addr, hsize_t size, hsize_t extra_requested)
{
htri_t ret_value; /* Return value */
herr_t ret_value; /* Return value */
FUNC_ENTER_NOAPI(H5MF_extend, FAIL);

View File

@ -53,7 +53,7 @@ H5_DLL herr_t H5MF_free_reserved(H5F_t *f, hsize_t size);
H5_DLL hbool_t H5MF_alloc_overflow(H5F_t *f, hsize_t size);
H5_DLL htri_t H5MF_can_extend(H5F_t *f, H5FD_mem_t type, haddr_t addr,
hsize_t size, hsize_t extra_requested);
H5_DLL htri_t H5MF_extend(H5F_t *f, H5FD_mem_t type, haddr_t addr, hsize_t size,
H5_DLL herr_t H5MF_extend(H5F_t *f, H5FD_mem_t type, haddr_t addr, hsize_t size,
hsize_t extra_requested);
#endif

742
src/H5O.c

File diff suppressed because it is too large Load Diff

View File

@ -268,11 +268,11 @@ H5_DLL herr_t H5O_unprotect(H5G_entry_t *ent, struct H5O_t *oh, hid_t dxpl_id,
H5_DLL int H5O_append(H5F_t *f, hid_t dxpl_id, struct H5O_t *oh, unsigned type_id,
unsigned flags, const void *mesg, unsigned * oh_flags_ptr);
H5_DLL herr_t H5O_touch(H5G_entry_t *ent, hbool_t force, hid_t dxpl_id);
H5_DLL herr_t H5O_touch_oh(H5F_t *f, struct H5O_t *oh, hbool_t force,
unsigned * oh_flags_ptr);
H5_DLL herr_t H5O_touch_oh(H5F_t *f, hid_t dxpl_id, struct H5O_t *oh,
hbool_t force, unsigned * oh_flags_ptr);
#ifdef H5O_ENABLE_BOGUS
H5_DLL herr_t H5O_bogus(H5G_entry_t *ent, hid_t dxpl_id);
H5_DLL herr_t H5O_bogus_oh(H5F_t *f, struct H5O_t *oh,
H5_DLL herr_t H5O_bogus_oh(H5F_t *f, hid_t dxpl_id, struct H5O_t *oh,
unsigned * oh_flags_ptr);
#endif /* H5O_ENABLE_BOGUS */
H5_DLL herr_t H5O_remove(H5G_entry_t *ent, unsigned type_id, int sequence,

View File

@ -1767,7 +1767,10 @@ setup_cache(size_t max_cache_size,
min_clean_size,
(NUMBER_OF_ENTRY_TYPES - 1),
(const char **)entry_type_names,
check_write_permitted);
check_write_permitted,
TRUE,
NULL,
NULL);
if ( cache_ptr == NULL ) {
@ -2186,6 +2189,10 @@ protect_entry(H5C_t * cache_ptr,
* Modified function to use the new dirtied parameter of
* H5C_unprotect().
*
* JRM -- 9/8/05
* Update for new entry size parameter in H5C_unprotect().
* We don't use them here for now.
*
*-------------------------------------------------------------------------
*/
@ -2226,7 +2233,7 @@ unprotect_entry(H5C_t * cache_ptr,
result = H5C_unprotect(NULL, -1, -1, cache_ptr, &(types[type]),
entry_ptr->addr, (void *)entry_ptr,
flags);
flags, 0);
if ( ( result < 0 ) ||
( entry_ptr->header.is_protected ) ||
@ -8530,6 +8537,10 @@ check_double_protect_err(void)
* Modified function to use the new dirtied parameter in
* H5C_unprotect().
*
* JRM -- 9/8/05
* Updated function for the new size change parameter in
* H5C_unprotect(). We don't use them for now.
*
*-------------------------------------------------------------------------
*/
@ -8567,8 +8578,8 @@ check_double_unprotect_err(void)
if ( pass ) {
result = H5C_unprotect(NULL, -1, -1, cache_ptr, &(types[0]),
entry_ptr->addr, (void *)entry_ptr,
H5C__NO_FLAGS_SET);
entry_ptr->addr, (void *)entry_ptr,
H5C__NO_FLAGS_SET, 0);
if ( result > 0 ) {

View File

@ -439,7 +439,7 @@ void big_dataset(void)
/* Check that file of the correct size was created */
file_size=h5_mpi_get_file_size(filename, MPI_COMM_WORLD, MPI_INFO_NULL);
#ifndef WIN32
VRFY((file_size == 2147485696ULL), "File is correct size");
VRFY((file_size == 2147485696ULL), "File is correct size (~2GB)");
#endif
/*
@ -470,7 +470,7 @@ void big_dataset(void)
/* Check that file of the correct size was created */
file_size=h5_mpi_get_file_size(filename, MPI_COMM_WORLD, MPI_INFO_NULL);
#ifndef WIN32
VRFY((file_size == 4294969344ULL), "File is correct size");
VRFY((file_size == 4294969344ULL), "File is correct size (~4GB)");
#endif
/*
@ -501,7 +501,7 @@ void big_dataset(void)
/* Check that file of the correct size was created */
file_size=h5_mpi_get_file_size(filename, MPI_COMM_WORLD, MPI_INFO_NULL);
#ifndef WIN32
VRFY((file_size == 8589936640ULL), "File is correct size");
VRFY((file_size == 8589936640ULL), "File is correct size (~8GB)");
#endif
/* Close fapl */