whisper.cpp

Running

App Files Files Community

whisper.cpp / ggml /src /ggml-cuda /ggml-cuda.cu

Commit History

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

18afa4b

uvos commited on Mar 6, 2025

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

d6b6852

William Tambellini slaren commited on Feb 28, 2025

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

f959b90

cmdr2 commited on Feb 28, 2025

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

cmdr2 commited on Feb 28, 2025

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

JohannesGaessler commited on Feb 22, 2025

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

6cb8158

Garf commited on Feb 22, 2025

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

Bodhi Bodhi Hu commited on Feb 21, 2025

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)

ff2d3eb

R0CKSTAR commited on Feb 13, 2025

HIP: Switch to std::vector in rocblas version check (llama/11820)

e144c94

uvos commited on Feb 12, 2025

CUDA: fix CUDART_VERSION checks (llama/11821)

04f123a

JohannesGaessler commited on Feb 12, 2025

CUDA: use arch list for compatibility check (llama/11775)

b88e163

JohannesGaessler Diego Devesa commited on Feb 10, 2025

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

78e36a2

JohannesGaessler commited on Feb 5, 2025

CUDA: non-contiguous (RMS) norm support (llama/11659)

4c2e171

JohannesGaessler

ggerganov HF Staff commited on Feb 4, 2025

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

uvos commited on Feb 2, 2025

HIP: Prepare reduction operators for wave 64

bc1c1a4

uvos commited on Jan 29, 2025

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

uvos commited on Jan 29, 2025

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)

82bb7f3

Nikita Sarychev commited on Jan 28, 2025

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

04b01d8

Haus1 commited on Jan 27, 2025

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

2cc4df4

uvos commited on Jan 25, 2025

hip : Add hipGraph and VMM support to ROCM (llama/11362)

089afa0

uvos commited on Jan 24, 2025

CUDA: fix FP16 cuBLAS GEMM (llama/11396)

7b7c5d3

JohannesGaessler commited on Jan 24, 2025

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)

6f5687a

uvos commited on Jan 24, 2025

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)

855a9fe

JohannesGaessler commited on Jan 24, 2025

CUDA: backwards pass for misc. ops, add tests (llama/11257)

2fbcec1

JohannesGaessler commited on Jan 16, 2025

RoPE: fix back, CUDA support for back + noncont. (llama/11240)

131a21e

JohannesGaessler commited on Jan 15, 2025

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)

25882f6

Andreas Kieslinger slaren commited on Jan 13, 2025

llama: add support for QRWKV6 model architecture (llama/11001)

4a6b7e0

mollysama

ggerganov HF Staff

compilade commited on Jan 10, 2025

CUDA: add BF16 support (llama/11093)

961ef57

JohannesGaessler commited on Jan 6, 2025

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736)

8544072

Andreas Kieslinger commited on Dec 10, 2024

ggml : refactor online repacking (llama/10446)

163128e

Djip007

ggerganov HF Staff commited on Dec 7, 2024

Add some minimal optimizations for CDNA (llama/10498)

bf49bbe

uvos commited on Nov 27, 2024

ggml : add support for dynamic loading of backends (llama/10469)

b73266f

Diego Devesa

ggerganov HF Staff commited on Nov 25, 2024

CUDA: fix MMV kernel being used for FP16 src1 (llama/10357)

af4dff1

JohannesGaessler commited on Nov 17, 2024

CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)

e446f60

JohannesGaessler commited on Nov 17, 2024

ggml : build backends as libraries (llama/10256)

3dc93f3

Diego Devesa

ggerganov HF Staff R0CKSTAR commited on Nov 14, 2024

Commit History

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) 18afa4b

ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) f959b90

cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32

CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 6cb8158

MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac

musa: bump MUSA SDK version to rc3.1.1 (llama/11822) ff2d3eb

HIP: Switch to std::vector in rocblas version check (llama/11820) e144c94

CUDA: fix CUDART_VERSION checks (llama/11821) 04f123a

CUDA: use arch list for compatibility check (llama/11775) b88e163

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656) 78e36a2

CUDA: non-contiguous (RMS) norm support (llama/11659) 4c2e171

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24

HIP: Prepare reduction operators for wave 64 bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info e538e2c

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) 82bb7f3

AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4

hip : Add hipGraph and VMM support to ROCM (llama/11362) 089afa0

CUDA: fix FP16 cuBLAS GEMM (llama/11396) 7b7c5d3

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 6f5687a

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380) 855a9fe

CUDA: backwards pass for misc. ops, add tests (llama/11257) 2fbcec1

RoPE: fix back, CUDA support for back + noncont. (llama/11240) 131a21e

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042) 25882f6

llama: add support for QRWKV6 model architecture (llama/11001) 4a6b7e0

CUDA: add BF16 support (llama/11093) 961ef57

CUDA: rename macros to avoid conflicts with WinAPI (llama/10736) 8544072

ggml : refactor online repacking (llama/10446) 163128e

Add some minimal optimizations for CDNA (llama/10498) bf49bbe

ggml : add support for dynamic loading of backends (llama/10469) b73266f

CUDA: fix MMV kernel being used for FP16 src1 (llama/10357) af4dff1

CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318) e446f60

ggml : build backends as libraries (llama/10256) 3dc93f3