Spaces:
Running
Running
Commit History
ggml : fix field name when new ggml_backend (llama/14944) 685748d
AN Long commited on
CUDA: attention sinks for mma FlashAttention (llama/15157) 0ab9aba
opencl: support sink in `soft_max` (attn sinks) (llama/15152) d8664e4
lhez commited on
vulkan: support fattn sinks (llama/15126) d7e9115
vulkan: Add env var to disable host visible vidmem (llama/15109) 5ec4382
HIP: add cmake option to enable compiler output of kernel resource usage metrics (llama/15103) 577f7e4
ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) f84562e
Christian Kastner commited on
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 1d24833
fix profiling crash (llama/15072) 67ec576
opencl: add `swiglu_oai` and `add_id` (llama/15121) 1c97db6
lhez commited on
ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2b7ae5e
Diego Devesa commited on
CANN: add support for ACL Graph (llama/15065) 137a0dc
Chenguang Li commited on
sycl: fix mul_mat selection (llama/15092) 344310a
Romain Biessy commited on
cmake: Add GGML_BACKEND_DIR option (llama/15074) 6e460b6
Christian Kastner commited on
vulkan: fix build when using glslang that does not support coopmat2 (llama/15062) 863e083
vulkan: Use coopmat2 for conv2d (llama/14982) 6df82f4
opencl: fix adreno compiler detection logic (llama/15029) e6a209e
lhez commited on
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035) 9e85264
cuda: make im2col a little faster (llama/15025) 9a85c65
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038) cc3a2ed
vulkan: coopmat2 mul_mat optimizations (llama/14934) ca86566
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (llama/15015) d4c4115
vulkan: optimizations for direct convolution (llama/14933) 215f463
CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014) fbc3cd1
opencl: add f16 for `add`, `sub`, `mul`, `div` (llama/14984) 4dc1834
lhez commited on
ggml : Q2k interleaving implementation - x86/x64 SIMD (llama/14373) e2965b0
Vulkan: Fix minor debug mode issues (llama/14899) a81bc86
CANN: Improve loading efficiency after converting weights to NZ format. (llama/14985) 7612978
opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (llama/14809) 05577c3
lhez commited on
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949) 149f5a5
CUDA: skip masked KV slices for all FA kernels (llama/14924) 0c60f80
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945) e37eff3
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) f9dbd96
HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931) 8e133f7
CANN: Add ggml_set_rows (llama/14943) fa22f70
cuda : add softcap fusion (llama/14907) 2237878
Sigbjørn Skjæret commited on
CUDA: add roll (llama/14919) d41a4ec
ggml-cpu : deduplicate scalar implementations (llama/14897) 1d58d7c
xctan commited on
SYCL: Add set_rows support for quantized types (llama/14883) c55b72b
CUDA: fix pointer incrementation in FA (llama/14916) eb84e7e
sycl: refactor quantization to q8_1 (llama/14815) 31edd77
Alberto Cabrera Pérez commited on
cmake : Fix BLAS link interface (ggml/1316) 3020711
Kai Pastor commited on
vulkan : fix 32-bit builds (ggml/1313) 96b66fd
Kai Pastor commited on
scripts : update sync scripts 311eccd
node : add win platform check for require path (#3363) 29b8653 unverified
ci : update main-cuda.Dockerfile (#3371) e79709c unverified
ustas commited on
whisper : fixed crash in GPU device selection on multi-GPU systems (#3372) 0869200 unverified
Dw9 commited on