whisper.cpp

Running

App Files Files Community

whisper.cpp / ggml /src

Commit History

ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

591cbfb

Rémy O commited on Mar 7, 2025

metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194)

092277a

BB-fat alexju commited on Mar 7, 2025

metal : fix default.metallib build (llama/12224)

838efb6

danbev commited on Mar 7, 2025

opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)

94449e3

lhez commited on Mar 7, 2025

cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)

dc68418

xiaofei Ray Lee commited on Mar 6, 2025

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

4dc8a81

JohannesGaessler commited on Mar 6, 2025

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

18afa4b

uvos commited on Mar 6, 2025

opencl : fix buffer alignment (llama/12197)

7d25156

linehill commited on Mar 6, 2025

opencl : fix `ulong` kernel args were set from `int` variables (llama/12174)

67ffff0

linehill commited on Mar 6, 2025

opencl : fix profile-related errors (llama/12095)

e11a847

simon886212 ubuntu commited on Mar 6, 2025

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)

05466a9

Rémy O commited on Mar 6, 2025

SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)

723b8b4

qnixsynapse commited on Mar 5, 2025

ggml : fix GGMLMetalClass ODR (llama/12200)

2094cb7

pacominev commited on Mar 5, 2025

ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)

c9a49f9

vmobilis commited on Mar 7, 2025

vulkan : sync (llama/0)

4c17fa1

ggerganov HF Staff commited on Mar 4, 2025

ggml : portability fixes for VS 2017 (llama/12150)

49e3343

mgroeber9110 Marcus Groeber commited on Mar 4, 2025

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

a027c1d

David Huang commited on Mar 3, 2025

ggml : fix kleidiai build (llama/12159)

dbc0180

ag2s20150909 commited on Mar 3, 2025

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)

1d6d451

qnixsynapse commited on Mar 3, 2025

ggml-backend : keep paths in native string type when possible (llama/12144)

6e89d8c

Diego Devesa commited on Mar 2, 2025

CUDA: compress mode option and default to size (llama/12029)

4ec988a

Green-Sky commited on Mar 1, 2025

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

d6b6852

William Tambellini slaren commited on Feb 28, 2025

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)

d7d82b9

Rémy O commited on Feb 28, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

0b52fcc

JohannesGaessler commited on Feb 28, 2025

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)

459beb1

Prashant Vithule vithulep commited on Feb 28, 2025

CANN: Fix build error with GCC 13 (llama/11990)

dcf68db

hipudding commited on Feb 28, 2025

vulkan: matmul dequantization improvements (llama/12015)

ffdf466

Eve commited on Feb 28, 2025

vulkan: improve im2col (llama/11826)

f6cff0a

Daniele commited on Feb 28, 2025

vulkan: fix assertion when qy_needs_dequant (llama/12068)

271c7e4

jeffbolznv commited on Feb 25, 2025

ggml-cpu: Fix build with sve (llama/12059)

4be146e

mollysama commited on Feb 25, 2025

cuda: unary ops as float + de-duplicate (ggml/1130)

4bec2e4

cmdr2 commited on Mar 3, 2025

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

f959b90

cmdr2 commited on Feb 28, 2025

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

cmdr2 commited on Feb 28, 2025

whisper : support GGML_BACKEND_DL (#2843)

2e6437e
unverified

Diego Devesa

ggerganov HF Staff commited on Feb 27, 2025

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)

2b94a24

cmdr2 commited on Feb 25, 2025

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

6c8e7ec

Garf

ggerganov HF Staff commited on Feb 25, 2025

opencl: fix for small models (llama/11950)

4532dc6

lhez Shawn Gu Skyler Szot commited on Feb 24, 2025

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)

14fd317

Neo Zhang Jianyu arthw commited on Feb 24, 2025

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)

310a36c

qnixsynapse commited on Feb 24, 2025

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)

4aa54ec

Aaron Teo Jinyang He junchao-zhao commited on Feb 22, 2025

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

JohannesGaessler commited on Feb 22, 2025

CUDA: optimize FA for GQA + large batches (llama/12014)

6662d54

JohannesGaessler commited on Feb 22, 2025

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

6cb8158

Garf commited on Feb 22, 2025

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

6641178

PureJourney

JohannesGaessler commited on Feb 21, 2025

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

Bodhi Bodhi Hu commited on Feb 21, 2025

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)

9de6d81

Charles Xu commited on Feb 20, 2025

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)

1a1acd2

Prashant Vithule vithulep

ggerganov HF Staff commited on Feb 20, 2025

CUDA: use async data loading for FlashAttention (llama/11894)

5b9980d

JohannesGaessler Diego Devesa commited on Feb 17, 2025

vulkan: implement several ops relevant for ggml_opt (llama/11769)

3c2171d

Rémy O commited on Feb 17, 2025

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)

1c7a669

jeffbolznv commited on Feb 16, 2025

Commit History

ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216) 591cbfb

metal : simplify kernel arguments using a struct (ggml/3229) (llama/12194) 092277a

metal : fix default.metallib build (llama/12224) 838efb6

opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217) 94449e3

cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094) dc68418

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222) 4dc8a81

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) 18afa4b

opencl : fix buffer alignment (llama/12197) 7d25156

opencl : fix `ulong` kernel args were set from `int` variables (llama/12174) 67ffff0

opencl : fix profile-related errors (llama/12095) e11a847

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154) 05466a9

SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201) 723b8b4

ggml : fix GGMLMetalClass ODR (llama/12200) 2094cb7

ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) c9a49f9

vulkan : sync (llama/0) 4c17fa1

ggml : portability fixes for VS 2017 (llama/12150) 49e3343

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d

ggml : fix kleidiai build (llama/12159) dbc0180

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133) 1d6d451

ggml-backend : keep paths in native string type when possible (llama/12144) 6e89d8c

CUDA: compress mode option and default to size (llama/12029) 4ec988a

ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595) d7d82b9

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098) 0b52fcc

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064) 459beb1

CANN: Fix build error with GCC 13 (llama/11990) dcf68db

vulkan: matmul dequantization improvements (llama/12015) ffdf466

vulkan: improve im2col (llama/11826) f6cff0a

vulkan: fix assertion when qy_needs_dequant (llama/12068) 271c7e4

ggml-cpu: Fix build with sve (llama/12059) 4be146e

cuda: unary ops as float + de-duplicate (ggml/1130) 4bec2e4

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) f959b90

cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32

whisper : support GGML_BACKEND_DL (#2843) 2e6437e unverified

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) 2b94a24

metal : copy kernels for quant to F32/F16 conversions (llama/12017) 6c8e7ec

opencl: fix for small models (llama/11950) 4532dc6

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035) 14fd317

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995) 310a36c

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) 4aa54ec

CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16

CUDA: optimize FA for GQA + large batches (llama/12014) 6662d54

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 6cb8158

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984) 6641178

MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390) 9de6d81

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917) 1a1acd2

CUDA: use async data loading for FlashAttention (llama/11894) 5b9980d

vulkan: implement several ops relevant for ggml_opt (llama/11769) 3c2171d

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902) 1c7a669