hdf5/doc/file-locking.md
Dana Robinson fe9c07fd90
Adds file locking documentation (#2084)
* Added initial (partial) file locking document

* Almost done with file locking document

* Fix intro

* Cleaned up text

* Updated environment variable verion info

* Fix typo

* Fix typos
2022-09-14 05:53:35 -07:00

16 KiB

File Locking in HDF5

This document describes the file locking scheme that was added to HDF5 in version 1.10.0 and how you can work around it, if you choose to do so. I'll try to keep it understandable for everyone, though diving into technical details is unavoidable, given the complexity of the material. We're in the process of converting the HDF5 user guide (UG) to Doxygen and this document will eventually be rolled up into those files as we update things.

Parallel HDF5 Note

Everything written here is from the perspective of serial HDF5. When we say that you can't access a file for write access from more than one process, we mean "from more than one independent, serial process". Parallel HDF5 can obviously write to a file from more than one process, but that involves IPC and multiple processes working together, not independent processes with no knowledge of each other, which is what the file locks are for.

Why file locks?

The short answer is: "To prevent you from corrupting your HDF5 files and/or crashing your reader processes."

The long answer is more complicated.

An HDF5 file's state exists in two places when it is open for writing:

  1. The HDF5 file itself
  2. The HDF5 library's various caches

One of those caches is the metadata cache, which stores things like B-tree nodes that we use to locate data in the file. Problems arise when parent objects are flushed to storage before child objects. If a reader tries to load unflushed children, the object's file offset could point at garbage and it will encounter library failures as it tries to access the non-existent objects.

Keep in mind that the HDF5 library is not analogous to a database server. The HDF5 library is just a simple shared library, like libjpeg. Library state is maintained per-library-instance and there is no IPC between HDF5 libraries loaded by different processes (exception: collective operations in parallel HDF5, but that's not what were talking about here).

Prior to HDF5 1.10.0, concurrent access to an HDF5 file by multiple processes, when one or more processes is a writer, was not supported. There was no enforcement mechanism for this. We simply told people not to do it.

In HDF5 1.10.0, we updated the library to allow the single-writer / multiple-readers (SWMR - pronounced "swimmer") access pattern. This setup allows one writer and multiple readers to access the same file, as long as a certain protocol is followed concerning file opening order and setting the right flags. Since synchronization might be tricky to pull off and the consequences of getting it wrong could result in corrupt files or crashed readers, we decided to add a file locking scheme to help users get it right. Since this would also help prevent harmful accesses when SWMR is not in use, we decided to switch the file locking scheme on by default. This scheme has been carried forward into HDF5 1.12 and 1.13 (soon to be 1.14).

Note that the current implementation of SWMR is only useful for appending to chunked datasets. Creating file objects like groups and datasets is not supported in the current SWMR implementation.

Unfortunately, this file locking scheme has caused problems for some users. This is usually people who are working on network file systems like NFS or on parallel file systems, especially when file locks have been disabled, which often causes lock calls to fail. As a result of this, we've added work-arounds to disable the file locking scheme over the years.

The existing scheme

There are two parts to the file locking scheme. One is the file lock itself. The second is a mark we make in the HDF5 file's superblock. The superblock mark isn't really that important for understanding the file locking, but since it's entwined with the file locking scheme, we'll cover it in the algorithm below. The lower-level details of file lock implementations are described in the appendix, but the semantics are straightforward: Locks are mandatory unless disabled, always for the entire file, and non-blocking. They are also not required for SWMR operations and simply exist to help you set up SWMR and prevent dangerous file access.

Here's how it all works:

  1. The first thing we do is check if we're using file locks

    • We first check the file locking property in the file access property list (fapl). The default value of this property is set at configure time when the library is built.
    • Next we check the value of the HDF5_USE_FILE_LOCKING environment variable, which was previously parsed at library startup. If this is set, we use the value to override the property list setting.

    The particulars of the ways you can disable file locks are described in a separate section below.

    If we are not using file locking, no further file locking operations will take place.

  2. We also check for ignoring file locks when they are disabled on the file system.

    • The environment variable setting for this is checked at VFD initialization time for all library VFDs.
    • We check the value in the fapl in the open callback. The default value for this property was set at configure time when the library was built.
  3. When we open a file, we lock it based on the file access flags:

    • If the H5F_ACC_RDWR flag is set, use an exclusive lock
    • Otherwise use a shared lock

    If we are ignoring disabled file locks (see below), we will silently swallow lock API call failure when locks are not implemented on the file system.

  4. If the VFD supports locking and the file is open for writing, we mark the file consistency flags in the file's superblock to indicate this.

    NOTE!

    • The VFD has to have a lock callback for this to happen. It doesn't matter if the locking was disabled - the check is simply for the callback.
    • We mark the superblock in ANY write case - both SWMR and non-SWMR.
    • Only the latest version of the superblock is marked in this way. If you open up a file that wasn't created with the 1.10.0 or later file format, it won't get the superblock mark, even if it's been opened for writing.

    According to the file format document and H5Fpkg.h:

    • Bit 0 is set if the file is open for writing (H5F_SUPER_WRITE_ACCESS)
    • Bit 2 is set if the file is open for SWMR writing (H5F_SUPER_SWMR_WRITE_ACCESS)

    We check these superblock flags on file open and error out if they are unsuitable.

    • If the file is already opened for non-SWMR writing, no other process can open it.
    • If the file is open for SWMR writing, only SWMR readers can open the file.
    • If you try to open a file for reading with H5F_ACC_SWMR_READ set and the file does not have the SWMR writer bits set in the superblock, the open call will fail.

    This scheme is often confused with the file locking, so it's included here, even though it's a bit tangential to the locks themselves.

  5. If the file is open for SWMR writing (H5F_ACC_SWMR_WRITE is set), we remove the file lock just before the open call completes.

  6. We normally don't explicitly unlock the file on file close. We let the OS handle it when the file descriptors are closed since file locks don't normally surivive closing the underlying file descriptor.

TL;DR

When locks are available, HDF5 files will be exclusively locked while they are in use. The exception to this are files that are opened for SWMR writing, which are unlocked. Files that are open for any kind of writing get a "writing" superblock mark that HDF5 1.10.0+ will respect and refuse to open outside of SWMR.

H5Fstart_swmr_write()

This API call can be used to switch an open file to "SWMR writing" mode as if it had been opened with the H5F_ACC_SWMR_WRITE flag set. This is used when code needs to perform SWMR-forbidden operations like creating groups and datasets before appending data to datasets using SWMR.

Most of the work of this API call involves flushing out the library caches in preparation for SWMR access, but there are a few locking operations that take place under the hood:

  • The file's superblock is marked as in the SWMR writer case, above.
  • For a brief period of time in the call, we convert the exclusive lock to a shared lock. It's unclear why this was done and we'll look into removing this.
  • At the end of the call, the lock is removed, as in the SWMR write open case described above.

Disabling the locks

There are several ways to disable the locks, depending on which version of the HDF5 library you are working with. This section will describe the file lock disable schemes as they exist in late 2022. The current library versions at this time are 1.10.9, 1.12.3, and 1.13.2. File locks are not present in HDF5 1.8. The lock feature matrix later in this document will describe the limitations of earlier versions.

Configure option

You can set the file locking defaults at configure time. This sets the defaults for the associated properties in the fapl. Users can override the configure defaults using H5Pset_file_locking() or the HDF5_USE_FILE_LOCKING environment variable.

  • Autotools

    --enable-file-locking=(yes|no|best-effort) sets the file locking behavior. on and off should be self-explanatory. best-effort turns file locking on but ignores file locks when they are disabled (default: best-effort).

  • CMake

    • set IGNORE_DISABLED_FILE_LOCK to ON to ignore file locks when they are disabled on the file system (default: ON).
    • set HDF5_USE_FILE_LOCKING to OFF to disable file locks (default: ON)

H5Pset_file_locking()

This API call can be used to override the configure defaults. It takes hbool_t parameters for both the file locking and "ignore file locks when disabled on the file system" parameters. The values set here can be overridden by the file locking environment variable.

There is a corresponding H5Pget_file_locking() call that can be used to check the currently set values of both properties in the fapl. NOTE that this call just checks the property list values. It does NOT check the environment variables!

Environment variables

The HDF5_USE_FILE_LOCKING environment variable overrides all other file locking settings.

HDF5 1.10.0

  • No file locking environment variable

HDF5 1.10.1 - 1.10.6, 1.12.0:

  • FALSE turns file locking off
  • Anything else turns file locking on
  • Neither of these values ignores disabled file locks
  • Environment variable parsed at file create/open time

HDF5 1.10.7+, 1.12.1+, 1.13.x:

  • FALSE or 0 disables file locking
  • TRUE or 1 enables file locking
  • BEST_EFFORT enables file locking and ignores disabled file locks
  • Anything else gives you the defaults
  • Environment variable parsed at library startup

Lock disable scheme interactions

As mentioned above and reiterated here:

  • Configure-time settings set fapl defaults
  • H5Pset_file_locking() overrides configure-time defaults
  • The environment variable setting overrides all

If you want to check that file locking is on, you'll need to check the fapl setting AND check the environment variable, which can override the fapl.

!!! WARNING !!!

Disabling the file locks is at your own risk. If more than one writer process modifies an HDF5 file at the same time, the file could be corrupted. If a reader process reads a file that is being modified by a writer, the writer process might attempt to read garbage and encounter errors or even crash.

In the case of:

  • A single process accessing a file with write access
  • Any number of processes accessing a file read-only

You can safely disable the file locking scheme.

If you are trying to set up SWMR without the benefit of the file locks, you'll just need to be extra careful that you hold to rules for SWMR access.

Feature Matrix

The following table indicates which versions of the library support which file lock features. 1.13.0 and 1.13.1 are experimental releases (basically glorified release candidates) so they are not included here.

Locks

  • P = POSIX locks only, Windows was a no-op that always succeeded
  • WP = POSIX and Windows locks
  • (-) = POSIX no-op lock fails
  • (+) = POSIX no-op lock passes

Configure Option and Environment Variable

  • on/off = sets file locks on/off
  • try = can also set "best effort", where locks are on but ignored if disabled
Version Has locks Configure option H5Pset_file_locking() HDF5_USE_FILE_LOCKING
1.8.x No - - -
1.10.0 P(-) - - -
1.10.1 P(-) - - on/off
1.10.2 P(-) - - on/off
1.10.3 P(-) - - on/off
1.10.4 P(-) - - on/off
1.10.5 P(-) - - on/off
1.10.6 P(-) - - on/off
1.10.7 P(+) try Y try
1.10.8 WP(+) try Y try
1.10.9 WP(+) try Y try
1.12.0 P(-) - - on/off
1.12.1 WP(+) try Y try
1.12.2 WP(+) try Y try
1.13.2 WP(+) try Y try

Appendix: File lock implementation

The file lock system is implemented with flock(2) as the archetype since it has simple semantics and we don't need range locking. Locks are advisory on many systems, but this shouldn't be a problem for most users since the HDF5 library always respects them. If you have a program that parses or modifies HDF5 files independently of the HDF5 library, you'll want to be mindful of any potential for concurrent access across processes.

On Unix systems, we call flock() directly when it's available and pass LOCK_SH (shared lock), LOCK_EX (exclusive lock), and LOCK_UN (unlock) as described in the algorithm section. All locks are non-blocking, so we set the LOCK_NB flag. Sadly, flock(2) is not POSIX and it doesn't lock files over NFS. We didn't consider a lack of NFS support a problem since SWMR isn't supported on networked file systems like NFS (write order preservation isn't guaranteed) and flock(2) usually doesn't fail when you attempt to lock NFS files.

On Unix systems without flock(2), we implement a scheme based on fcntl(2) (Pflock() in H5system.c). On these systems we use F_SETLK (non-blocking) as the operation and set l_type in struct flock to be:

  • F_UNLOCK for LOCK_UN
  • F_WRLOCK for LOCK_EX
  • F_RDLOCK for LOCK_SH

We set the range to be the entire file. Most Unix-like systems have flock() these days, so this system probably isn't very well tested.

We don't use fcntl-based open file locks or mandatory locking anywhere. The former scheme is non-POSIX and the latter is deprecated.

On Windows, we use LockFileEx() and UnlockFileEx() to lock the entire file (Wflock() in H5system.c). We set LOCKFILE_FAIL_IMMEDIATELY to get non-blocking locks and set LOCKFILE_EXCLUSIVE_LOCK when we want an exclusive lock. SWMR isn't well-tested on Windows, so this scheme hasn't been as thoroughly vetted as the flock-based scheme.

On non-Windows systems where neither flock(2) nor fcntl(2) is available, we substitute a no-op stub that always succeeds (Nflock() in H5system.c). In the past, the stub always failed (see the matrix for when we made the switch). We currently know of no non-Windows systems where neither call is available so this scheme is not well-tested.

One thing that should be immediately apparent to anyone familiar with file locking, is that all of these schemes have subtly different semantics. We're using file locking in a fairly crude manner, though, and lock use has always been optional, so we consider this a lower-order concern.

Locks are implemented at the VFD level via lock and unlock callbacks. The VFDs that implement file locks are: core (w/ backing store), direct, log, sec2, and stdio (flock(2) locks only). The family, multi, and splitter VFDs invoke the lock callback of their underlying sub-files. The onion and MPI-IO VFDs do NOT use locks, even though they create normal, on-disk native HDF5 files. The read-only S3 VFD and HDFS VFDs do not use file locking since they use alternative storage schemes.

Lock failures are detected by checking to see if errno is set to ENOSYS. This is not particularly sophisticated and was implemented as a way of working around disabled locks on popular parallel file systems.

One other thing to note here is that, in all of the locking schemes we use, the file locks do not survive process termination, so you don't have to worry about files being locked forever if a process exits abnormally. If a writer crashed and the library didn't clear the superblock mark, you can remove it with the h5clear command-line tool, which is built with the library.