Spaces:
Running
Running
Commit History
kompute : improve backend to pass test_backend_ops (llama/10542) c8008b8
CANN: Fix SOC_TYPE compile bug (llama/10519) 7f24ebb
leo-pony commited on
CANN: ROPE operator optimization (llama/10540) 63ee002
Add some minimal optimizations for CDNA (llama/10498) bf49bbe
uvos commited on
metal : fix group_norm support condition (llama/0) 20ee62d
vulkan: define all quant data structures in types.comp (llama/10440) cea89af
vulkan: Handle GPUs with less shared memory (llama/10468) 18a0ad1
vulkan: further optimize q5_k mul_mat_vec (llama/10479) cb018d4
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506) c6d15e0
vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459) c032c06
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516) f2a87fc
R0CKSTAR commited on
vulkan: fix group_norm (llama/10496) 8f5eeb8
cmake : enable warnings in llama (llama/10474) 26a670b
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487) 6d586a0
Charles Xu commited on
CANN: Improve the Inferencing Performance for Ascend NPU Device (llama/10454) f9fd6d6
Shanshan Shen shanshan shen Frank Mai commited on
CANN: RoPE and CANCAT operator optimization (llama/10488) b357ea7
vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484) 6a4b6ae
metal : enable mat-vec kernels for bs <= 4 (llama/10491) 6d07dee
llama : accept a list of devices to use to offload a model (llama/10497) 6d7599e
Diego Devesa commited on
ggml : add support for dynamic loading of backends (llama/10469) b73266f
metal : minor code formatting 385a521
ggml : do not use ARM features not included in the build (llama/10457) 0001327
Diego Devesa commited on
CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216) c9e03e6
leo-pony commited on
cuda : optimize argmax (llama/10441) 69ae50d
vulkan: predicate max operation in soft_max shaders/soft_max (llama/10437) 0a14325
vulkan: copy iq4_nl LUT into shared memory (llama/10409) c31abdb
vulkan: further optimize mul_mat_vec using larger loads (llama/10387) 50a2978
add cmake rvv support (llama/10411) e0bf47c
haopeng commited on
CUDA: remove unnecessary warp reduce in FA (ggml/1032) 9a8c238
feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019) c7e59ef
metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026) 9c845f4
Do not include arm_neon.h when compiling CUDA code (ggml/1028) 80663f4
Frankie Robertson commited on
ggml-opt: fix data corruption (ggml/1022) a916e92
ruby : Add low-level methods to transcribe (#2585) 4bf69ed unverified
models : add `q8_0` models to `download-ggml-model.sh` (#2589) 7feeb43 unverified
ruby : Follow source tree change (#2580) 7895d75 unverified
whisper : use backend registry (#0) b9f5e40
ggml/sched : do not skip views in pre-assignments b1eba61
slaren commited on
whisper : adapt to new ggml (wip) ec6f374
talk-llama : sync llama.cpp 1568fc8
sync : ggml e3c317a
ggml : sync resolve (skip) (#0) d4d67dc
Add required ggml-base and backend libs to cmake pkg (llama/10407) 8fdd994
bandoti commited on
cuda : fix CUDA_FLAGS not being applied (llama/10403) 22e1593
Diego Devesa commited on
sycl : Add option to set the SYCL architecture for all targets (llama/10266) 0d836df
Romain Biessy commited on
vulkan: Optimize soft_max (llama/10301) 5cb851d
sycl: Revert MUL_MAT_OP support changes (llama/10385) 6df9941
Alberto Cabrera Pérez commited on
cuda : only use native when supported by cmake (llama/10389) 24d2e82
Diego Devesa commited on