Spaces:
Running
Running
Commit History
ggml : fix and optimize ppc64le (ggml/849) e3d09d2
Hong Bo PENG commited on
ggml : remove duplicate include of ggml-common.h (ggml/853) 8c3ae74
remove global variables (llama/7710) 4cb73ba
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921) 5931562
metal : utilize max shared memory for mul_mat_id (llama/7935) d4b3604
rpc : fix ggml_backend_rpc_supports_buft() (llama/7918) 56e6751
move BLAS to a separate backend (llama/6210) c773aa9
CUDA: fix broken oob check for FA vec f32 kernel (llama/7904) efbb7be
tests : add non-cont unary tests (llama/7857) 6dc2887
ggml : improve ggml_is_contiguous logic (llama/7856) ea3aa71
vulkan: select only one device for single gpu with multiple drivers (llama/7582) ee56a37
Update Vulkan RoPE implementation (llama/7818) 71850e7
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860) 154bf2b
CUDA: use tensor cores for MMQ (llama/7676) 78a5b67
use the correct SYCL context for host USM allocations (llama/7777) 9f87c2f
CUDA: revise q8_1 data layout for mul_mat_q (llama/7824) fcfd59e
vulkan : reuse parent extra for views (llama/7806) b9b60de
fix softmax r2r result wrong issue (llama/7811) c3a7159
CUDA: refactor mmq, dmmv, mmvq (llama/7716) 849ff52
ggml : refactor rope norm/neox (llama/7634) ded0c68
Allow number of nodes in CUDA graph to change (llama/7738) 6124287
agray3 commited on
ggml : remove OpenCL (llama/7735) 4ff3b72
ggml : prevent builds with -ffinite-math-only (llama/7726) 154f0f8
llama : offload to RPC in addition to other backends (llama/7640) eab8082
ggml : use OpenMP as a thread pool (llama/7606) 7e5d850
Vulkan Mixture of Experts (MoE) support (llama/7628) ad9ee26
kompute : implement op_getrows_f32 (llama/6403) fa0872f
woachk commited on
fix bug introduced in using calloc (llama/7701) f22c7e4
Dave Airlie commited on
Fix FlashAttention debug test, FP32 assert (llama/7684) 1bed92f
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681) d4c0faf
CUDA: quantized KV support for FA vec (llama/7527) 315df8c
ggml : fix loongson compile warnings (llama/7537) c1442f3
faster avx512 exp implementation (llama/7551) 6dbbbab
ggml : fix loongarch build (O2 issue) (llama/7636) 133ffbf
junchao-loongson commited on
metal : remove invalid asserts (llama/7617) 562afce
metal : add missing asserts (llama/7617) be552ab
ggml : fix YARN + add tests + add asserts (llama/7617) 15da5f7
cuda : non-cont concat support (llama/7610) 64d3007
llama-bench : add support for the RPC backend (llama/7435) d460266
ggml : use atomic_flag for critical section (llama/7598) 68c6582
slaren commited on
examples : adapt to new ggml_concat (ggml/0) 36af6c5
ggml : fix typo in ggml.c (llama/7603) f06f1cb
Align GEMM dispatch (llama/7566) 2171dc6
sycl : fix assert (llama/7563) b4fb287
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (llama/7552) da90a1e
rpc : resource management rework (llama/7562) 7571b13
fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436) f0ee71c
Neo Zhang commited on