whisper.cpp

Running

App Files Files Community

whisper.cpp / ggml /src

Commit History

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

Hale Chan commited on Feb 16, 2025

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

Adrian Kretz commited on Feb 15, 2025

repo : update links to new url (llama/11886)

9705bb5

ggerganov HF Staff commited on Feb 15, 2025

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)

0d2e888

Rémy O commited on Feb 15, 2025

opencl: Fix rope and softmax (llama/11833)

bf3b6f8

lhez commited on Feb 14, 2025

cuda : add ampere to the list of default architectures (llama/11870)

1d19dec

Diego Devesa commited on Feb 14, 2025

ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)

e3acbfc

Jinyang He commited on Feb 14, 2025

vulkan: linux builds + small subgroup size fixes (llama/11767)

e3f0e78

Eve commited on Feb 14, 2025

llamafile: use member variable instead of constant for iq4nlt (llama/11780)

0cb2d04

jmorganca commited on Feb 13, 2025

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)

ff2d3eb

R0CKSTAR commited on Feb 13, 2025

ggml-cpu : add chunking support to mul_mat_id (llama/11666)

e59d9a7

Diego Devesa commited on Feb 13, 2025

ggml : x2 speed for WASM by optimizing SIMD (llama/11453)

464a186

Xuan-Son Nguyen camel-cdr commited on Feb 12, 2025

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)

78aed55

uvos commited on Feb 12, 2025

HIP: Switch to std::vector in rocblas version check (llama/11820)

e144c94

uvos commited on Feb 12, 2025

ggml : fix multi-threaded clamp_f32 (llama/11824)

1b1d6a8

Richard commited on Feb 12, 2025

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)

05b9e78

ownia commited on Feb 12, 2025

CUDA: fix CUDART_VERSION checks (llama/11821)

04f123a

JohannesGaessler commited on Feb 12, 2025

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)

86969ac

Sheldon Robinson commited on Feb 11, 2025

CUDA: use arch list for compatibility check (llama/11775)

b88e163

JohannesGaessler Diego Devesa commited on Feb 10, 2025

fix: typos in documentation files (llama/11791)

5c6d350

Maxim Evtush commited on Feb 10, 2025

vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)

762f497

Danny Milosavljevic

jeffbolznv commited on Feb 10, 2025

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592)

f9fd130

Wagner Bruna commited on Feb 10, 2025

vulkan: account for lookup tables when checking shared memory size (llama/11502)

758970f

jeffbolznv commited on Feb 9, 2025

ggml: Fix data race in ggml threadpool (llama/11736)

5554d5f

Karol Kontny commited on Feb 8, 2025

CUDA: fix min. version for movmatrix (llama/11751)

9ac5316

JohannesGaessler commited on Feb 8, 2025

vulkan: print shared memory size (llama/11719)

fb33a94

jeffbolznv commited on Feb 7, 2025

SYCL: remove XMX info from print devices (llama/11712)

dea29f2

qnixsynapse commited on Feb 7, 2025

ggml : optimize and build warning fix for LoongArch (llama/11709)

b82d241

Jinyang He commited on Feb 7, 2025

SYCL: Adjust support condition for norm operators (llama/11674)

7e1dbe9

qnixsynapse commited on Feb 6, 2025

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

f7296aa

junchao-zhao commited on Feb 6, 2025

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)

3731f13

jeffbolznv commited on Feb 6, 2025

vulkan: initial support for IQ4_XS quantization (llama/11501)

ed46ad5

Rémy O commited on Feb 6, 2025

vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)

1b7672d

jeffbolznv commited on Feb 6, 2025

metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)

5bdb244

charles-dyfis-net commited on Feb 6, 2025

metal : adjust support conditions for norm operators (llama/11671)

5eb35ab

ggerganov HF Staff commited on Feb 5, 2025

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

78e36a2

JohannesGaessler commited on Feb 5, 2025

CUDA: non-contiguous (RMS) norm support (llama/11659)

4c2e171

JohannesGaessler

ggerganov HF Staff commited on Feb 4, 2025

HIP: force max threads per block to be 1024 (llama/11621)

f509509

fxzjshm commited on Feb 4, 2025

metal : use residency set for other platforms (llama/11648)

0e58088

jhenjie commited on Feb 4, 2025

rpc: fix known RCE in rpc-server (ggml/1103)

76be3a9

Retr0REG commited on Feb 6, 2025

cmake : fix compile assumptions for power9/etc (#2777)

4683df3
unverified

midnight midnight commited on Feb 5, 2025

CUDA: fix Volta FlashAttention logic (llama/11615)

6df9571

JohannesGaessler commited on Feb 3, 2025

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

acfd94f

JohannesGaessler commited on Feb 2, 2025

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

uvos commited on Feb 2, 2025

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

uvos commited on Feb 2, 2025

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

JohannesGaessler Diego Devesa commited on Feb 2, 2025

`ci`: use sccache on windows instead of ccache (llama/11545)

9ed1962

Olivier Chafik commited on Jan 31, 2025

HIP: require at least HIP 5.5

72c425b

uvos commited on Jan 29, 2025

HIP: Prepare reduction operators for wave 64

bc1c1a4

uvos commited on Jan 29, 2025

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

uvos commited on Jan 29, 2025

Commit History

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904) afbd891

metal : optimize dequant q6_K kernel (llama/11892) 376cbe6

repo : update links to new url (llama/11886) 9705bb5

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528) 0d2e888

opencl: Fix rope and softmax (llama/11833) bf3b6f8

cuda : add ampere to the list of default architectures (llama/11870) 1d19dec

ggml: optimize some vec dot functions for LoongArch ASX (llama/11842) e3acbfc

vulkan: linux builds + small subgroup size fixes (llama/11767) e3f0e78

llamafile: use member variable instead of constant for iq4nlt (llama/11780) 0cb2d04

musa: bump MUSA SDK version to rc3.1.1 (llama/11822) ff2d3eb

ggml-cpu : add chunking support to mul_mat_id (llama/11666) e59d9a7

ggml : x2 speed for WASM by optimizing SIMD (llama/11453) 464a186

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831) 78aed55

HIP: Switch to std::vector in rocblas version check (llama/11820) e144c94

ggml : fix multi-threaded clamp_f32 (llama/11824) 1b1d6a8

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817) 05b9e78

CUDA: fix CUDART_VERSION checks (llama/11821) 04f123a

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803) 86969ac

CUDA: use arch list for compatibility check (llama/11775) b88e163

fix: typos in documentation files (llama/11791) 5c6d350

vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494) 762f497

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592) f9fd130

vulkan: account for lookup tables when checking shared memory size (llama/11502) 758970f

ggml: Fix data race in ggml threadpool (llama/11736) 5554d5f

CUDA: fix min. version for movmatrix (llama/11751) 9ac5316

vulkan: print shared memory size (llama/11719) fb33a94

SYCL: remove XMX info from print devices (llama/11712) dea29f2

ggml : optimize and build warning fix for LoongArch (llama/11709) b82d241

SYCL: Adjust support condition for norm operators (llama/11674) 7e1dbe9

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701) f7296aa

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521) 3731f13

vulkan: initial support for IQ4_XS quantization (llama/11501) ed46ad5

vulkan: use smaller combined allocations to avoid fragmentation (llama/11551) 1b7672d

metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690) 5bdb244

metal : adjust support conditions for norm operators (llama/11671) 5eb35ab

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656) 78e36a2

CUDA: non-contiguous (RMS) norm support (llama/11659) 4c2e171

HIP: force max threads per block to be 1024 (llama/11621) f509509

metal : use residency set for other platforms (llama/11648) 0e58088

rpc: fix known RCE in rpc-server (ggml/1103) 76be3a9

cmake : fix compile assumptions for power9/etc (#2777) 4683df3 unverified

CUDA: fix Volta FlashAttention logic (llama/11615) 6df9571

HIP: fix flash_attn_stream_k_fixup warning (llama/11604) acfd94f

CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957

`ci`: use sccache on windows instead of ccache (llama/11545) 9ed1962

HIP: require at least HIP 5.5 72c425b

HIP: Prepare reduction operators for wave 64 bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info e538e2c

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

repo : update links to new url (llama/11886)

9705bb5

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)

0d2e888

opencl: Fix rope and softmax (llama/11833)

bf3b6f8

cuda : add ampere to the list of default architectures (llama/11870)

1d19dec

ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)

e3acbfc

vulkan: linux builds + small subgroup size fixes (llama/11767)

e3f0e78

llamafile: use member variable instead of constant for iq4nlt (llama/11780)

0cb2d04

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)

ff2d3eb

ggml-cpu : add chunking support to mul_mat_id (llama/11666)

e59d9a7

ggml : x2 speed for WASM by optimizing SIMD (llama/11453)

464a186

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)

78aed55

HIP: Switch to std::vector in rocblas version check (llama/11820)

e144c94

ggml : fix multi-threaded clamp_f32 (llama/11824)

1b1d6a8

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)

05b9e78

CUDA: fix CUDART_VERSION checks (llama/11821)

04f123a

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)

86969ac

CUDA: use arch list for compatibility check (llama/11775)

b88e163

fix: typos in documentation files (llama/11791)

5c6d350

vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)

762f497

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (llama/11592)

f9fd130

vulkan: account for lookup tables when checking shared memory size (llama/11502)

758970f

ggml: Fix data race in ggml threadpool (llama/11736)

5554d5f

CUDA: fix min. version for movmatrix (llama/11751)

9ac5316

vulkan: print shared memory size (llama/11719)

fb33a94

SYCL: remove XMX info from print devices (llama/11712)

dea29f2

ggml : optimize and build warning fix for LoongArch (llama/11709)

b82d241

SYCL: Adjust support condition for norm operators (llama/11674)

7e1dbe9

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

f7296aa

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)

3731f13

vulkan: initial support for IQ4_XS quantization (llama/11501)

ed46ad5

vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)

1b7672d

metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)

5bdb244

metal : adjust support conditions for norm operators (llama/11671)

5eb35ab

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

78e36a2

CUDA: non-contiguous (RMS) norm support (llama/11659)

4c2e171

HIP: force max threads per block to be 1024 (llama/11621)

f509509

metal : use residency set for other platforms (llama/11648)

0e58088

rpc: fix known RCE in rpc-server (ggml/1103)

76be3a9

cmake : fix compile assumptions for power9/etc (#2777)

4683df3
unverified

CUDA: fix Volta FlashAttention logic (llama/11615)

6df9571

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

acfd94f

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

`ci`: use sccache on windows instead of ccache (llama/11545)

9ed1962

HIP: require at least HIP 5.5

72c425b

HIP: Prepare reduction operators for wave 64

bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info

e538e2c