Spaces:
Running
Running
Commit History
ggml : full ALiBi support (llama/7192) 192bda4
Introduction of CUDA Graphs to LLama.cpp (llama/6766) 08fc76d
agray3 slaren commited on
Add an option to build without CUDA VMM (llama/7067) 38b1143
ggml : add Flash Attention (llama/5021) 34d3b03
ggml : group all experts in a single ggml_mul_mat_id (llama/6505) f0b5c67
CUDA: fix matrix multiplication logic for tests (llama/6667) 6ccb5a5
feat: implemented sigmoid function (ggml/806) cd0c122
Justina Cho commited on
llama : add Command R Plus support (llama/6491) 8cf7097 unverified
ggml : mul_mat_id use the same tensor for all the experts (llama/6387) 26fdc9f unverified
ggml: bypass code incompatible with CUDA < 11.1 (#2020) 32f4e35 unverified
sync : ggml (#2001) cbbfa9e unverified
ggml : reuse quantum structs across backends (llama/5943) bb0625f unverified
1.5 bit: we can do even better (llama/5999) 36cc71e unverified
Better 1.5 bit quantization (llama/5971) f3a62cc unverified
ggml : add ggml-common.h to deduplicate shared code (llama/5940) 0a37735 unverified
ggml : introduce ggml_status (ggml/750) 151c676 unverified
cuda : fix data race in soft max (llama/5853) d1b60e4 unverified
slaren commited on
ggml : IQ3_S improvements (llama/5829) 06a8e30 unverified
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760) 9a07f42 unverified
IQ4_XS: a 4.25 bpw quantization (llama/5747) 0ee1bfb unverified
cuda : replace remaining shfl_xor with calls to warp_reduce functions (llama/5744) 753b30d unverified
Engininja2 commited on
CUDA: fix DEBUG_CUDA_MALLOC (llama/5729) f18f386 unverified
code : normalize enum names (llama/5697) 93e0830 unverified
IQ3_S: a much better alternative to Q3_K (llama/5676) 32589c9 unverified
Introduce backend GUIDs (ggml/743) a7eb9f6 unverified
UEXTM.com slaren commited on
ggml : always define ggml_fp16_t as uint16_t (llama/5666) bc567d3 unverified
sync : llama.cpp (ggml/0) f8e8d34 unverified
cuda : ignore peer access already enabled errors (llama/5597) a817d85 unverified
slaren commited on
ci : enable -Werror for CUDA builds (llama/5579) df03a10 unverified
cuda, metal : fix nans in soft_max (llama/5574) 44164ac unverified
1.5 bit quantization (llama/5453) 9c3aa6a unverified
ggml : add ALiBi support for ggml_soft_max_ext (llama/5488) 26c019a unverified
cuda : print message when initialization fails (llama/5512) 1f047ca unverified
slaren commited on
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434) c0cfa9b unverified
CUDA: more warps for mmvq on NVIDIA (llama/5394) 7ab774c unverified
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386) 3ff7660 unverified
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370) 7aa3216 unverified
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351) ae45b38 unverified
cuda : fix LLAMA_CUDA_F16 (llama/5262) 5fd8fb7 unverified
slaren commited on
llava : add MobileVLM support (llama/5132) f17a416 unverified
JidongZhang-THU slaren commited on
sync : ggml (llama/0) cdb7964 unverified
SOTA 3-bit quants (llama/5196) 4649943 unverified
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686) 75d438c unverified
John Balis slaren commited on
ggml : add Vulkan backend (llama/2059) 5a97aba unverified
cuda : fix tensor size calculation for non-split buffer (llama/5145) 8f3eb65 unverified
slaren commited on
cuda : fix 2-bit quants on amd hip (llama/5105) aadbd67 unverified
Engininja2 commited on