whisper.cpp / ggml /src /ggml-cuda /ggml-cuda.cu

Commit History

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)
18afa4b

uvos commited on

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
d6b6852

William Tambellini slaren commited on

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
f959b90

cmdr2 commited on

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
67e8c32

cmdr2 commited on

CUDA: app option to compile without FlashAttention (llama/12025)
fbc5f16

JohannesGaessler commited on

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)
6cb8158

Garf commited on

MUSA: support ARM64 and enable dp4a .etc (llama/11843)
ab96dac

Bodhi Bodhi Hu commited on

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)
ff2d3eb

R0CKSTAR commited on

HIP: Switch to std::vector in rocblas version check (llama/11820)
e144c94

uvos commited on

CUDA: fix CUDART_VERSION checks (llama/11821)
04f123a

JohannesGaessler commited on

CUDA: use arch list for compatibility check (llama/11775)
b88e163

JohannesGaessler Diego Devesa commited on

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)
78e36a2

JohannesGaessler commited on

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
4850c24

uvos commited on

HIP: Prepare reduction operators for wave 64
bc1c1a4

uvos commited on

CUDA/HIP: add warp_size to cuda_device_info
e538e2c

uvos commited on

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)
82bb7f3

Nikita Sarychev commited on

AMD: parse the architecture as supplied by gcnArchName (llama/11244)
04b01d8

Haus1 commited on

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2cc4df4

uvos commited on

hip : Add hipGraph and VMM support to ROCM (llama/11362)
089afa0

uvos commited on

CUDA: fix FP16 cuBLAS GEMM (llama/11396)
7b7c5d3

JohannesGaessler commited on

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)
6f5687a

uvos commited on

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)
855a9fe

JohannesGaessler commited on

CUDA: backwards pass for misc. ops, add tests (llama/11257)
2fbcec1

JohannesGaessler commited on

RoPE: fix back, CUDA support for back + noncont. (llama/11240)
131a21e

JohannesGaessler commited on

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)
25882f6

Andreas Kieslinger slaren commited on

CUDA: add BF16 support (llama/11093)
961ef57

JohannesGaessler commited on

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736)
8544072

Andreas Kieslinger commited on

ggml : refactor online repacking (llama/10446)
163128e

Djip007 ggerganov HF Staff commited on

Add some minimal optimizations for CDNA (llama/10498)
bf49bbe

uvos commited on

ggml : add support for dynamic loading of backends (llama/10469)
b73266f

Diego Devesa ggerganov HF Staff commited on

CUDA: fix MMV kernel being used for FP16 src1 (llama/10357)
af4dff1

JohannesGaessler commited on

CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)
e446f60

JohannesGaessler commited on

ggml : build backends as libraries (llama/10256)
3dc93f3

Diego Devesa ggerganov HF Staff R0CKSTAR commited on