Spaces:
Running
Running
Commit History
cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (llama/14741) bb523fb
Oliver Simons commited on
CUDA: set_rows + cpy.cu refactor (llama/14712) 536128f
use max work group size for device to replace the magic number (llama/14732) e5e9b79
Neo Zhang Jianyu commited on
ggml: Add initial WebGPU backend (llama/14521) 0dd208f
Reese Levine commited on
llama : add high-throughput mode (llama/14363) b2d73a2
ggml : add asserts (llama/14720) 7073590
vulkan: fix noncontig check for mat_mul_id splitting (llama/14683) 4d0d8b8
vulkan: add RTE variants for glu/add/sub/mul/div (llama/14653) bac21a7
cuda: fix build warnings in set-rows.cu (unused variable) (llama/14687) 1e145c7
sycl: Hotfix for non dnnl codepath (llama/14677) 75496c9
Anton Mitkov commited on
ggml : refactor llamafile_sgemm PPC code (llama/14673) 3e2a209
SYCL: use 1D kernel for set_rows (llama/14618) b305121
sycl: Batched mulmat rework for oneDNN dispatch (llama/14617) 2722bea
Anton Mitkov commited on
cuda : add set rows for bf16 (llama/14664) 1f97ff4
Sigbjørn Skjæret commited on
cuda : add ELU support (llama/14657) cbe8006
Yavor Ivanov commited on
ggml : add build-time message to remind about ggml_set_rows (llama/14661) 0f5d4ba
metal : Add missing unary ops Metal support (llama/14660) 2ed022e
Yavor Ivanov commited on
CUDA: add set rows for f32 and f16 (llama/14551) e51f2d4
whisper: validate get_rows support for cpu extra buffer (#3323) 3c6ba32 unverified
Charles Xu commited on
examples : update links in wasm examples (#3318) e03b5a6 unverified
sync : resolve conflicts (#0) 5ec49ef
talk-llama : sync llama.cpp bc53087
sync : ggml 116dcaa
sync : resolve conflicts (ggml/0) 497add0
vulkan: support SET_ROWS (llama/14587) 9821f43
vulkan: optimizations for deepseek prompt processing (llama/14555) 04b631e
model : support LiquidAI LFM2 hybrid family (llama/14620) 07ff90a
Tarek Dakhran commited on
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634) 4354560
Slobodan Josic commited on
opencl: add tiled mul_mat_f16_f32 (llama/14535) 398dc49
opencl: add `set_rows` for `f16` and `f32` (llama/14547) 5e203ec
lhez commited on
SYCL: Initial set_rows kernel implementation (llama/14562) e62ef85
cuda : support Falcon-H1 state size for SSM_SCAN (llama/14602) 92b2d32
ggml : add ggml_scale_bias (llama/14417) 573d50a
ggml : prevent integer overflow in gguf tensor size calculation (llama/14595) 31f34e7
vulkan: optimize flash attention split_k_reduce (llama/14554) 45fbb42
vulkan : fix rope with partial rotation and non-cont src (llama/14582) 367fa85
cuda : fix rope with partial rotation and non-cont src (llama/14580) aaf2d96
CUDA: add bilinear interpolation for upscale (llama/14563) 68ded09
musa: fix build warnings (unused variable) (llama/14561) 891b1d1
CUDA: add bf16 and i32 to getrows (llama/14529) 014494c
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (llama/14485) effd61f
Eve Rémy Oudompheng commited on
vulkan: fix rms_norm+mul fusion (llama/14545) 0791e65
vulkan: Handle updated FA dim2/3 definition (llama/14518) d1e619e
opencl: add GELU_ERF (llama/14476) b19d736
Sigbjørn Skjæret commited on
metal : disable fast math in all quantize kernels (llama/14528) df9d510
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (llama/14002) b9b5859
luyhcsu luyuhong commited on
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445) f798922
Sigbjørn Skjæret commited on
opencl : broadcast for soft_max (llama/14510) 4434043
lhez commited on