SYCL BLAS is a current work in progress research project from an ongoing collaboration with the High Performance Computing & Architectures (HPCA) group from the Universitat Jaume I UJI.
SYCL BLAS is written using modern C++. The current implementation uses C++11 features but we aim to move to C++14 in the short term. See Roadmap for details on the current status and plans for the project.
Nowadays, all the numerical computations are based on a set of standard libraries on which the most common operations are implemented. These libraries are different for dense matrices (BLAS, LAPACK, ScaLAPACK, ...) and for sparse matrices (SparseBLAS, ...). Moreover, there are vendor implementations which are adjusted to the platform features:
- For multicores: ACML (AMD), ATLAS, Intel-MKL, OpenBLAS, ...
- For GPUs: cuBLAS(nVidia), clBLAS, MAGMA, ...
On GPUs, the data communication to/from the device and the grain of the kernels play an important rule on the performances of the developments. On one hand, to reduce the communication cost, the most of the data should be mapped on the device, even the scalars. On the other hand, growing the size of the kernels allows the CPU to complete other tasks while the GPU is computing or to enter an energy-efficient C-state, reducing the energy consumption.
To enlarge the grain of the kernels is a complex task, in which many aspects should be considered as the dependency between kernels, the grid topology, the grid sizes, etc. This complexity justifies that, usually, the fused kernels are manually written. An alternative to simplify this task could be to build a expression tree on which all the single operation which are required to solve a problem appears. This structure could be analysed by the compiler to decide how to merge the different kernel and the best grid topology to execute the fused kernel. The use of expression trees is one of most important features of SYCL-BLAS.
SYCL BLAS uses C++ Expression Tree templates to generate SYCL Kernels via kernel composition. Expression Tree templates are a widely used technique to implement expressions on C++, that facilitate development and composition of operations. In particular, Kernel composition in SYCL has been used in various projects to create efficient domain specific embedded languages that enable users to easily fuse GPU kernels.
SYCL-BLAS is a header-only library. All the relevant files can be found in the include directory. There are four components in SYCL-BLAS, the View, the Operations, the Executors and the Interface itself."
https://github.com/codeplaysoftware/sycl-blas
No comments:
Post a Comment