Commit History

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)
afbd891

Hale Chan commited on

metal : optimize dequant q6_K kernel (llama/11892)
376cbe6

Adrian Kretz commited on

repo : update links to new url (llama/11886)
9705bb5

ggerganov HF Staff commited on

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)
0d2e888

Rémy O commited on

opencl: Fix rope and softmax (llama/11833)
bf3b6f8

lhez commited on

cuda : add ampere to the list of default architectures (llama/11870)
1d19dec

Diego Devesa commited on

ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)
e3acbfc

Jinyang He commited on

vulkan: linux builds + small subgroup size fixes (llama/11767)
e3f0e78

Eve commited on

llamafile: use member variable instead of constant for iq4nlt (llama/11780)
0cb2d04

jmorganca commited on

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)
ff2d3eb

R0CKSTAR commited on

ggml-cpu : add chunking support to mul_mat_id (llama/11666)
e59d9a7

Diego Devesa commited on

ggml : x2 speed for WASM by optimizing SIMD (llama/11453)
464a186

Xuan-Son Nguyen camel-cdr commited on

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)
78aed55

uvos commited on

HIP: Switch to std::vector in rocblas version check (llama/11820)
e144c94

uvos commited on

ggml : fix multi-threaded clamp_f32 (llama/11824)
1b1d6a8

Richard commited on

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)
05b9e78

ownia commited on

CUDA: fix CUDART_VERSION checks (llama/11821)
04f123a

JohannesGaessler commited on

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)
86969ac

Sheldon Robinson commited on

CUDA: use arch list for compatibility check (llama/11775)
b88e163

JohannesGaessler Diego Devesa commited on

fix: typos in documentation files (llama/11791)
5c6d350

Maxim Evtush commited on

vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)
762f497

Danny Milosavljevic jeffbolznv commited on

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592)
f9fd130

Wagner Bruna commited on

vulkan: account for lookup tables when checking shared memory size (llama/11502)
758970f

jeffbolznv commited on

ggml: Fix data race in ggml threadpool (llama/11736)
5554d5f

Karol Kontny commited on

CUDA: fix min. version for movmatrix (llama/11751)
9ac5316

JohannesGaessler commited on

vulkan: print shared memory size (llama/11719)
fb33a94

jeffbolznv commited on

SYCL: remove XMX info from print devices (llama/11712)
dea29f2

qnixsynapse commited on

ggml : optimize and build warning fix for LoongArch (llama/11709)
b82d241

Jinyang He commited on

SYCL: Adjust support condition for norm operators (llama/11674)
7e1dbe9

qnixsynapse commited on

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)
f7296aa

junchao-zhao commited on

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)
3731f13

jeffbolznv commited on

vulkan: initial support for IQ4_XS quantization (llama/11501)
ed46ad5

Rémy O commited on

vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)
1b7672d

jeffbolznv commited on

metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)
5bdb244

charles-dyfis-net commited on

metal : adjust support conditions for norm operators (llama/11671)
5eb35ab

ggerganov HF Staff commited on

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)
78e36a2

JohannesGaessler commited on

HIP: force max threads per block to be 1024 (llama/11621)
f509509

fxzjshm commited on

metal : use residency set for other platforms (llama/11648)
0e58088

jhenjie commited on

rpc: fix known RCE in rpc-server (ggml/1103)
76be3a9

Retr0REG commited on

cmake : fix compile assumptions for power9/etc (#2777)
4683df3
unverified

midnight midnight commited on

CUDA: fix Volta FlashAttention logic (llama/11615)
6df9571

JohannesGaessler commited on

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)
acfd94f

JohannesGaessler commited on

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
ed08269

uvos commited on

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
4850c24

uvos commited on

CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957

JohannesGaessler Diego Devesa commited on

`ci`: use sccache on windows instead of ccache (llama/11545)
9ed1962

Olivier Chafik commited on

HIP: require at least HIP 5.5
72c425b

uvos commited on

HIP: Prepare reduction operators for wave 64
bc1c1a4

uvos commited on

CUDA/HIP: add warp_size to cuda_device_info
e538e2c

uvos commited on