Spaces:
Running
Running
Commit History
ggml-alloc : allocate all leafs as if they were inputs (ggml/731) a512417 unverified
slaren commited on
talk-llama : sync llama.cpp aa42df9 unverified
sync : ggml be7d266 unverified
ggml-backend : sync remnant 3f5165f unverified
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434) c0cfa9b unverified
vulkan: only use M-sized matmul on Apple GPUs (llama/5412) 350284e unverified
Sergio López commited on
ggml : fix compile warnings (unused vars) (llama/4966) 97fa2e3 unverified
ggml : add mmla kernels for quantized GEMM (llama/4966) 0d50a29 unverified
snadampal commited on
metal : use autoreleasepool to avoid memory leaks (llama/5437) c276f12 unverified
ggml-alloc : v3 (ggml/727) 5cffd6f unverified
slaren commited on
examples : added audio_ctx argument to main and server (#1857) 469988b unverified
metal : option to embed MSL source into compiled binary (#1842) a46b62a unverified
Didzis Gosko commited on
examples : initialize context params properly (#1852) 3443ee7 unverified
talk-llama : sync llama.cpp e6d6e1d unverified
sync : ggml 94800c5 unverified
src : relocate new backend sources 44cd2d4 unverified
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404) 8ebb36c unverified
Michael Podvitskiy commited on
CUDA: more warps for mmvq on NVIDIA (llama/5394) 7ab774c unverified
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386) 3ff7660 unverified
Basic Vulkan Multi-GPU implementation (llama/5321) 5d130aa unverified
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370) 7aa3216 unverified
Slight quantization improvement for Q4_K and Q5_K (llama/5361) e3cd020 unverified
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351) ae45b38 unverified
ggml : make use of ggml-quants.h possible in C++ code (llama/5338) 963ade6 unverified
ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325) 9bb2b0a unverified
iq2_xxs: tune quantization (llama/5320) 11e5f6b unverified
cuda : fix LLAMA_CUDA_F16 (llama/5262) 5fd8fb7 unverified
slaren commited on
metal : add im2col F32 dst support (llama/5132) 26aec77 unverified
llava : add MobileVLM support (llama/5132) f17a416 unverified
JidongZhang-THU slaren commited on
ggml : limit n_threads to the max n_tasks (llama/5238) 2645c33 unverified
slaren commited on
kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226) 0c9c434 unverified
ggml : add abort_callback for cpu backend (ggml/725) a8ea91b unverified
Michael Podvitskiy commited on
extra : update sync scripts d99e873 unverified
server : allow CORS request with authorization headers (#1850) 16a6639 unverified
Valentin Gosu commited on
whisper : expose CUDA device setting in public API (#1840) d13ee66 unverified
Didzis Gosko commited on
make : add macOS deployment target option (#1839) 9c90601 unverified
Didzis Gosko commited on
talk-llama : stream response (#1121) 2193f2b unverified
sync : ggml (#0) fded75b unverified
ggml : fix IQ3_XXS on Metal (llama/5219) f066321 unverified
sync : ggml (llama/0) cdb7964 unverified
SOTA 3-bit quants (llama/5196) 4649943 unverified
ggml alloc: Fix for null dereference on alloc failure (llama/5200) 8181686 unverified
Paul Tsochantaris commited on
Nomic Vulkan backend (llama/4456) f5fd92d unverified
ggml : add max buffer sizes to opencl and metal backends (llama/5181) 3d354d0 unverified
slaren commited on
metal : free metal objects (llama/5161) ea7167a unverified
Paul Tsochantaris commited on
gguf : fix comparison (ggml/715) 80cfca4 unverified
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686) 75d438c unverified
John Balis slaren commited on