Wednesday, April 12, 2017

Spindle

"Spindle is a tool for improving the library-loading performance of dynamically-linked HPC applications. At a high level, Spindle:
  • Provides a mechanism for scalable loading of shared libraries, executables and python files from a shared file system at scale without turning the file system into a bottleneck.
  • Is a pure user-space approach.  Users do not need to configure new file systems, load modules into their kernels or build special system components. 
  • Operates on stock binaries.  No application modification or special build flags are required. 
  • Automatically detects libraries as they’re loaded, so there is no need for pre-generated lists of libraries.  Spindle can scalably load the targets of dlopen calls, dependencies, or libraries loaded by forked child processes.
  • Is very scalable.  Under one benchmark, start-up performance without Spindle at 64 nodes was similar to start-up performance with Spindle at 1280 nodes—a performance improvement of 20X!  And many applications will likely get better benefits from Spindle than this benchmark.
  • Operates on Linux/x86_64 systems.  Cray and BlueGene/Q ports are underway.
When a dynamically-linked application starts it needs to load its dependent dynamic libraries from disk. This is done by the dynamic linker, a system library loaded into each process, which is usually named something like /lib64/ld-linux.so. The dynamic linker will search through a list of directories on what is known as its search path and test them for the existence of the application’s dynamic libraries. For an application running across N processes, this can produce O(N * num_dependent_libraries * num_search_path_entries) file operations. This number can grow very large at scale and easily overwhelm even high-bandwidth parallel file systems.

Spindle plugs into the system’s dynamic linker and intercepts these file operations. Instead of allowing every process to do file operations and flood the file system, one process (or a designated small number) will perform the file operations necessary for locating and loading dynamic libraries, then share the results of those operations with other processes in the job.

The results of those file operations will be things like file or directory contents. File contents are stored in a local location on each node, such as a ramdisk or SSD, and Spindle directs the application to load libraries from these locations rather than the shared file system."

https://computation.llnl.gov/projects/spindle

https://github.com/hpc/Spindle

https://computation.llnl.gov/projects/spindle/spindle-paper.pdf

No comments:

Post a Comment