whisper.cpp

Running

App Files Files Community

whisper.cpp / ggml /src

Commit History

cmake : fix compile assumptions for power9/etc (#2777)

4683df3
unverified

midnight midnight commited on Feb 5, 2025

CUDA: fix Volta FlashAttention logic (llama/11615)

6df9571

JohannesGaessler commited on Feb 3, 2025

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

acfd94f

JohannesGaessler commited on Feb 2, 2025

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

uvos commited on Feb 2, 2025

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

uvos commited on Feb 2, 2025

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

JohannesGaessler Diego Devesa commited on Feb 2, 2025

`ci`: use sccache on windows instead of ccache (llama/11545)

9ed1962

Olivier Chafik commited on Jan 31, 2025

HIP: require at least HIP 5.5

72c425b

uvos commited on Jan 29, 2025

HIP: Prepare reduction operators for wave 64

bc1c1a4

uvos commited on Jan 29, 2025

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

uvos commited on Jan 29, 2025

vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)

bd93c1b

Rémy Oudompheng

jeffbolznv commited on Jan 29, 2025

vulkan: Catch pipeline creation failure and print an error message (llama/11436)

d4f6b2c

jeffbolznv commited on Jan 29, 2025

HIP: Supress transformation warning in softmax.cu

72c6f1d

uvos commited on Jan 28, 2025

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)

82bb7f3

Nikita Sarychev commited on Jan 28, 2025

cmake : don't fail on `GGML_CPU=OFF` (llama/11457)

6406a6e

someone13574 commited on Jan 28, 2025

SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)

8aaf0c8

qnixsynapse commited on Jan 28, 2025

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

04b01d8

Haus1 commited on Jan 27, 2025

metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)

4e38ed4

Ihar Hrachyshka commited on Jan 27, 2025

metal : use residency sets (llama/11427)

9da4d68

ggerganov HF Staff commited on Jan 26, 2025

cmake: add ggml find package (llama/11369)

ca6577f

bandoti

ggerganov HF Staff commited on Jan 26, 2025

vulkan: compile shaders on-demand (llama/11406)

5c008f7

jeffbolznv commited on Jan 25, 2025

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

2cc4df4

uvos commited on Jan 25, 2025

hip : Add hipGraph and VMM support to ROCM (llama/11362)

089afa0

uvos commited on Jan 24, 2025

CUDA: fix FP16 cuBLAS GEMM (llama/11396)

7b7c5d3

JohannesGaessler commited on Jan 24, 2025

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)

6f5687a

uvos commited on Jan 24, 2025

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)

855a9fe

JohannesGaessler commited on Jan 24, 2025

Vulkan-run-test: fix mmq_wg_denoms (llama/11343)

133a580

amd-dwang commited on Jan 23, 2025

vulkan: sort shaders for more deterministic binary (llama/11315)

d7c0046

jeffbolznv commited on Jan 23, 2025

vulkan: fix diag_mask_inf (llama/11323)

f76204e

jeffbolznv commited on Jan 23, 2025

rpc : better caching of the base buffer pointer (llama/11331)

81a6cae

rgerganov commited on Jan 21, 2025

metal : fix out-of-bounds write (llama/11314)

1101050

ggerganov HF Staff commited on Jan 21, 2025

vulkan: fix coopmat2 validation failures (llama/11284)

f2cc7e9

jeffbolznv commited on Jan 20, 2025

SYCL: Introducing memory host pool (llama/11251)

aedb0b3

Nicolò Scipione commited on Jan 19, 2025

cmake : add sanitizer flags for llama.cpp (llama/11279)

3547979

ggerganov HF Staff

JohannesGaessler commited on Jan 18, 2025

vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281)

e0e73fa

jeffbolznv commited on Jan 18, 2025

rpc : early register backend devices (llama/11262)

4134077

rgerganov commited on Jan 17, 2025

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166)

3bb9e77

jeffbolznv commited on Jan 16, 2025

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206)

ee122d3

jeffbolznv commited on Jan 16, 2025

vulkan: optimize coopmat2 q2_k dequant function (llama/11130)

d49a569

jeffbolznv commited on Jan 16, 2025

CUDA: backwards pass for misc. ops, add tests (llama/11257)

2fbcec1

JohannesGaessler commited on Jan 16, 2025

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227)

bf3dc93

fj-y-saito

ggerganov HF Staff commited on Jan 16, 2025

vulkan: scale caching for k quants + misc fixes (llama/11081)

03ab36f

Eve commited on Jan 15, 2025

fix: ggml: fix vulkan-shaders-gen build (llama/10448)

ad8f031

Sparkleholic commited on Jan 15, 2025

RoPE: fix back, CUDA support for back + noncont. (llama/11240)

131a21e

JohannesGaessler commited on Jan 15, 2025

SYCL: Add gated linear attention kernel (llama/11175)

fdb1fe5

qnixsynapse commited on Jan 15, 2025

ggml : add option to not print stack on abort (ggml/1081)

9b2706e

William Tambellini Diego Devesa commited on Jan 23, 2025

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)

8e57313

issixx issi commited on Jan 17, 2025

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

92311a3

JohannesGaessler commited on Jan 14, 2025

ggml : add opencl backend (skip) (llama/10693)

226358f

lhez Skyler Szot Shangqing Gu Alexander Angus Hongqiang Wang Max Krasnyansky commited on Jan 14, 2025

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)

25882f6

Andreas Kieslinger slaren commited on Jan 13, 2025

Commit History

cmake : fix compile assumptions for power9/etc (#2777) 4683df3 unverified

CUDA: fix Volta FlashAttention logic (llama/11615) 6df9571

HIP: fix flash_attn_stream_k_fixup warning (llama/11604) acfd94f

CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957

`ci`: use sccache on windows instead of ccache (llama/11545) 9ed1962

HIP: require at least HIP 5.5 72c425b

HIP: Prepare reduction operators for wave 64 bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info e538e2c

vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360) bd93c1b

vulkan: Catch pipeline creation failure and print an error message (llama/11436) d4f6b2c

HIP: Supress transformation warning in softmax.cu 72c6f1d

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) 82bb7f3

cmake : don't fail on `GGML_CPU=OFF` (llama/11457) 6406a6e

SYCL : SOFTMAX F16 mask support and other fixes (llama/11261) 8aaf0c8

AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8

metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441) 4e38ed4

metal : use residency sets (llama/11427) 9da4d68

cmake: add ggml find package (llama/11369) ca6577f

vulkan: compile shaders on-demand (llama/11406) 5c008f7

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4

hip : Add hipGraph and VMM support to ROCM (llama/11362) 089afa0

CUDA: fix FP16 cuBLAS GEMM (llama/11396) 7b7c5d3

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 6f5687a

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380) 855a9fe

Vulkan-run-test: fix mmq_wg_denoms (llama/11343) 133a580

vulkan: sort shaders for more deterministic binary (llama/11315) d7c0046

vulkan: fix diag_mask_inf (llama/11323) f76204e

rpc : better caching of the base buffer pointer (llama/11331) 81a6cae

metal : fix out-of-bounds write (llama/11314) 1101050

vulkan: fix coopmat2 validation failures (llama/11284) f2cc7e9

SYCL: Introducing memory host pool (llama/11251) aedb0b3

cmake : add sanitizer flags for llama.cpp (llama/11279) 3547979

vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281) e0e73fa

rpc : early register backend devices (llama/11262) 4134077

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166) 3bb9e77

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206) ee122d3

vulkan: optimize coopmat2 q2_k dequant function (llama/11130) d49a569

CUDA: backwards pass for misc. ops, add tests (llama/11257) 2fbcec1

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227) bf3dc93

vulkan: scale caching for k quants + misc fixes (llama/11081) 03ab36f

fix: ggml: fix vulkan-shaders-gen build (llama/10448) ad8f031

RoPE: fix back, CUDA support for back + noncont. (llama/11240) 131a21e

SYCL: Add gated linear attention kernel (llama/11175) fdb1fe5

ggml : add option to not print stack on abort (ggml/1081) 9b2706e

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) 8e57313

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) 92311a3

ggml : add opencl backend (skip) (llama/10693) 226358f

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042) 25882f6

cmake : fix compile assumptions for power9/etc (#2777)

4683df3
unverified

CUDA: fix Volta FlashAttention logic (llama/11615)

6df9571

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

acfd94f

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

`ci`: use sccache on windows instead of ccache (llama/11545)

9ed1962

HIP: require at least HIP 5.5

72c425b

HIP: Prepare reduction operators for wave 64

bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)

bd93c1b

vulkan: Catch pipeline creation failure and print an error message (llama/11436)

d4f6b2c

HIP: Supress transformation warning in softmax.cu

72c6f1d

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)

82bb7f3

cmake : don't fail on `GGML_CPU=OFF` (llama/11457)

6406a6e

SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)

8aaf0c8

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

04b01d8

metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)

4e38ed4

metal : use residency sets (llama/11427)

9da4d68

cmake: add ggml find package (llama/11369)

ca6577f

vulkan: compile shaders on-demand (llama/11406)

5c008f7

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

2cc4df4

hip : Add hipGraph and VMM support to ROCM (llama/11362)

089afa0

CUDA: fix FP16 cuBLAS GEMM (llama/11396)

7b7c5d3

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)

6f5687a

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)

855a9fe

Vulkan-run-test: fix mmq_wg_denoms (llama/11343)

133a580

vulkan: sort shaders for more deterministic binary (llama/11315)

d7c0046

vulkan: fix diag_mask_inf (llama/11323)

f76204e

rpc : better caching of the base buffer pointer (llama/11331)

81a6cae

metal : fix out-of-bounds write (llama/11314)

1101050

vulkan: fix coopmat2 validation failures (llama/11284)

f2cc7e9

SYCL: Introducing memory host pool (llama/11251)

aedb0b3

cmake : add sanitizer flags for llama.cpp (llama/11279)

3547979

vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281)

e0e73fa

rpc : early register backend devices (llama/11262)

4134077

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166)

3bb9e77

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206)

ee122d3

vulkan: optimize coopmat2 q2_k dequant function (llama/11130)

d49a569

CUDA: backwards pass for misc. ops, add tests (llama/11257)

2fbcec1

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227)

bf3dc93

vulkan: scale caching for k quants + misc fixes (llama/11081)

03ab36f

fix: ggml: fix vulkan-shaders-gen build (llama/10448)

ad8f031

RoPE: fix back, CUDA support for back + noncont. (llama/11240)

131a21e

SYCL: Add gated linear attention kernel (llama/11175)

fdb1fe5

ggml : add option to not print stack on abort (ggml/1081)

9b2706e

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)

8e57313

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

92311a3

ggml : add opencl backend (skip) (llama/10693)

226358f

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)

25882f6