cmake : fix compile assumptions for power9/etc (#2777) 4683df3 unverified midnight midnight commited on Feb 5, 2025
HIP: fix flash_attn_stream_k_fixup warning (llama/11604) acfd94f JohannesGaessler commited on Feb 2, 2025
CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269 uvos commited on Feb 2, 2025
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24 uvos commited on Feb 2, 2025
CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957 JohannesGaessler Diego Devesa commited on Feb 2, 2025
`ci`: use sccache on windows instead of ccache (llama/11545) 9ed1962 Olivier Chafik commited on Jan 31, 2025
vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360) bd93c1b Rémy Oudompheng jeffbolznv commited on Jan 29, 2025
vulkan: Catch pipeline creation failure and print an error message (llama/11436) d4f6b2c jeffbolznv commited on Jan 29, 2025
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) 82bb7f3 Nikita Sarychev commited on Jan 28, 2025
SYCL : SOFTMAX F16 mask support and other fixes (llama/11261) 8aaf0c8 qnixsynapse commited on Jan 28, 2025
AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8 Haus1 commited on Jan 27, 2025
metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441) 4e38ed4 Ihar Hrachyshka commited on Jan 27, 2025
cmake: add ggml find package (llama/11369) ca6577f bandoti ggerganov HF Staff commited on Jan 26, 2025
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4 uvos commited on Jan 25, 2025
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 6f5687a uvos commited on Jan 24, 2025
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380) 855a9fe JohannesGaessler commited on Jan 24, 2025
vulkan: sort shaders for more deterministic binary (llama/11315) d7c0046 jeffbolznv commited on Jan 23, 2025
rpc : better caching of the base buffer pointer (llama/11331) 81a6cae rgerganov commited on Jan 21, 2025
cmake : add sanitizer flags for llama.cpp (llama/11279) 3547979 ggerganov HF Staff JohannesGaessler commited on Jan 18, 2025
vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281) e0e73fa jeffbolznv commited on Jan 18, 2025
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166) 3bb9e77 jeffbolznv commited on Jan 16, 2025
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206) ee122d3 jeffbolznv commited on Jan 16, 2025
vulkan: optimize coopmat2 q2_k dequant function (llama/11130) d49a569 jeffbolznv commited on Jan 16, 2025
CUDA: backwards pass for misc. ops, add tests (llama/11257) 2fbcec1 JohannesGaessler commited on Jan 16, 2025
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227) bf3dc93 fj-y-saito ggerganov HF Staff commited on Jan 16, 2025
RoPE: fix back, CUDA support for back + noncont. (llama/11240) 131a21e JohannesGaessler commited on Jan 15, 2025
ggml : add option to not print stack on abort (ggml/1081) 9b2706e William Tambellini Diego Devesa commited on Jan 23, 2025
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) 8e57313 issixx issi commited on Jan 17, 2025
GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) 92311a3 JohannesGaessler commited on Jan 14, 2025
ggml : add opencl backend (skip) (llama/10693) 226358f lhez Skyler Szot Shangqing Gu Alexander Angus Hongqiang Wang Max Krasnyansky commited on Jan 14, 2025
cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042) 25882f6 Andreas Kieslinger slaren commited on Jan 13, 2025