Thursday, April 13, 2017

CLBlast

"CLBlast is a modern, lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices.

This preview-version is not yet tuned for all OpenCL devices: out-of-the-box performance on some devices might be poor. See below for a list of already tuned devices and instructions on how to tune yourself and contribute to future releases of the CLBlast library.

Why CLBlast and not clBLAS or cuBLAS?

Use CLBlast instead of clBLAS:
  • When you care about achieving maximum performance.
  • When you want to be able to inspect the BLAS kernels or easily customize them to your needs.
  • When you run on exotic OpenCL devices for which you need to tune yourself.
  • When you are still running on OpenCL 1.1 hardware.
  • When you value an organized and modern C++ codebase.
  • When you target Intel CPUs and GPUs or embedded devices
  • When you can benefit from the increased performance of half-precision fp16 data-types.
Use CLBlast instead of cuBLAS:
  • When you want your code to run on devices other than NVIDIA CUDA-enabled GPUs.
  • When you want to tune for a specific configuration (e.g. rectangular matrix-sizes).
  • When you sleep better if you know that the library you use is open-source.
  • When you are using OpenCL rather than CUDA.
When not to use CLBlast:
  • When you run on NVIDIA's CUDA-enabled GPUs only and can benefit from cuBLAS's assembly-level tuned kernels
https://github.com/CNugteren/CLBlast

CLBlast: A Tuned OpenCL BLAS Library - https://hgpu.org/?p=17236

No comments:

Post a Comment