-
Updated
Jun 18, 2018 - C
gemm
Here are 24 public repositories matching this topic...
Tuned OpenCL BLAS
-
Updated
Aug 16, 2020 - C++
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
-
Updated
Nov 9, 2019 - Nim
BLISlab: A Sandbox for Optimizing GEMM
-
Updated
Aug 6, 2019 - C
Stretching GPU performance for GEMMs and tensor contractions.
-
Updated
Aug 25, 2020 - Python
DBCSR: Distributed Block Compressed Sparse Row matrix library
-
Updated
Aug 23, 2020 - Fortran
code for benchmarking GPU performance based on cublasSgemm and cublasHgemm
-
Updated
Jul 7, 2017 - Cuda
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
-
Updated
Mar 28, 2019 - C
Low Precision Arithmetic for Convolutional Neural Network Inference
-
Updated
Oct 29, 2017 - C++
-
Updated
Feb 4, 2018 - C++
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.
-
Updated
Aug 20, 2020 - C++
My experiments with convolution
-
Updated
Jun 21, 2020 - C++
Improve this page
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."


We should prefix CMake build options with "CT2_", e.g.
CT2_WITH_MKLinstead ofWITH_MKL. This is a good practice to avoid possible conflicts with other projects.The usage should then be updated in several places: