metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194) 092277a BB-fat alexju commited on Mar 7, 2025
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217) 94449e3 lhez commited on Mar 7, 2025
cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094) dc68418 xiaofei Ray Lee commited on Mar 6, 2025
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222) 4dc8a81 JohannesGaessler commited on Mar 6, 2025
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) 18afa4b uvos commited on Mar 6, 2025
opencl : fix `ulong` kernel args were set from `int` variables (llama/12174) 67ffff0 linehill commited on Mar 6, 2025
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154) 05466a9 Rémy O commited on Mar 6, 2025
SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201) 723b8b4 qnixsynapse commited on Mar 5, 2025
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) c9a49f9 vmobilis commited on Mar 7, 2025
ggml : portability fixes for VS 2017 (llama/12150) 49e3343 mgroeber9110 Marcus Groeber commited on Mar 4, 2025
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d David Huang commited on Mar 3, 2025
SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133) 1d6d451 qnixsynapse commited on Mar 3, 2025
ggml-backend : keep paths in native string type when possible (llama/12144) 6e89d8c Diego Devesa commited on Mar 2, 2025
CUDA: compress mode option and default to size (llama/12029) 4ec988a Green-Sky commited on Mar 1, 2025
ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852 William Tambellini slaren commited on Feb 28, 2025
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595) d7d82b9 Rémy O commited on Feb 28, 2025
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098) 0b52fcc JohannesGaessler commited on Feb 28, 2025
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064) 459beb1 Prashant Vithule vithulep commited on Feb 28, 2025
vulkan: fix assertion when qy_needs_dequant (llama/12068) 271c7e4 jeffbolznv commited on Feb 25, 2025
cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) f959b90 cmdr2 commited on Feb 28, 2025
cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32 cmdr2 commited on Feb 28, 2025
whisper : support GGML_BACKEND_DL (#2843) 2e6437e unverified Diego Devesa ggerganov HF Staff commited on Feb 27, 2025
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) 2b94a24 cmdr2 commited on Feb 25, 2025
metal : copy kernels for quant to F32/F16 conversions (llama/12017) 6c8e7ec Garf ggerganov HF Staff commited on Feb 25, 2025
opencl: fix for small models (llama/11950) 4532dc6 lhez Shawn Gu Skyler Szot commited on Feb 24, 2025
Optimize mul_mat for Q4_0 on Intel GPU (llama/12035) 14fd317 Neo Zhang Jianyu arthw commited on Feb 24, 2025
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) 4aa54ec Aaron Teo Jinyang He junchao-zhao commited on Feb 22, 2025
CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16 JohannesGaessler commited on Feb 22, 2025
CUDA: optimize FA for GQA + large batches (llama/12014) 6662d54 JohannesGaessler commited on Feb 22, 2025
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 6cb8158 Garf commited on Feb 22, 2025
CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984) 6641178 PureJourney JohannesGaessler commited on Feb 21, 2025
MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac Bodhi Bodhi Hu commited on Feb 21, 2025
ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390) 9de6d81 Charles Xu commited on Feb 20, 2025
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917) 1a1acd2 Prashant Vithule vithulep ggerganov HF Staff commited on Feb 20, 2025
CUDA: use async data loading for FlashAttention (llama/11894) 5b9980d JohannesGaessler Diego Devesa commited on Feb 17, 2025
vulkan: implement several ops relevant for ggml_opt (llama/11769) 3c2171d Rémy O commited on Feb 17, 2025
vulkan: support multi/vision rope, and noncontiguous rope (llama/11902) 1c7a669 jeffbolznv commited on Feb 16, 2025