Tensor algebra library routines for shared memory systems. The TAL_SH library provides API for performing basic tensor algebra operations on multicore CPU, NVidia GPU, Intel Xeon Phi, and other accelerators. Basic tensor algebra operations include tensor contraction, tensor product, tensor addition, tensor transpose, multiplication by a scalar, etc., which operate on locally stored tensors. The execution of tensor operations on accelerators is asynchronous with respect to the CPU host, if the underlying node is heterogeneous. Both Fortran and C/C++ API interfaces are provided. The library has a simplified object-oriented design, although without explicit object-oriented syntax.
https://github.com/DmitryLyakh/TAL_SH
cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs -
https://hgpu.org/?p=17219
http://on-demand.gputechconf.com/gtc/2017/presentation/s7255-antti-pekka-hynninen-cutt-a-high-performance-tensor-transpose-library-for-gpus.pdf
No comments:
Post a Comment