whisper.cpp

Running

App Files Files Community

whisper.cpp

Commit History

sync : ggml

f0a0087
unverified

ggerganov HF Staff commited on Feb 12, 2024

ggml-alloc : allocate all leafs as if they were inputs (ggml/731)

a512417
unverified

slaren commited on Feb 12, 2024

talk-llama : sync llama.cpp

aa42df9
unverified

ggerganov HF Staff commited on Feb 12, 2024

sync : ggml

be7d266
unverified

ggerganov HF Staff commited on Feb 12, 2024

ggml-backend : sync remnant

3f5165f
unverified

ggerganov HF Staff commited on Feb 12, 2024

CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)

c0cfa9b
unverified

JohannesGaessler slaren commited on Feb 11, 2024

vulkan: only use M-sized matmul on Apple GPUs (llama/5412)

350284e
unverified

Sergio López commited on Feb 11, 2024

ggml : fix compile warnings (unused vars) (llama/4966)

97fa2e3
unverified

ggerganov HF Staff commited on Feb 11, 2024

ggml : add mmla kernels for quantized GEMM (llama/4966)

0d50a29
unverified

snadampal commited on Feb 11, 2024

metal : use autoreleasepool to avoid memory leaks (llama/5437)

c276f12
unverified

irbull commited on Feb 10, 2024

ggml-alloc : v3 (ggml/727)

5cffd6f
unverified

slaren commited on Feb 11, 2024

examples : added audio_ctx argument to main and server (#1857)

469988b
unverified

dscripka

ggerganov HF Staff commited on Feb 12, 2024

metal : option to embed MSL source into compiled binary (#1842)

a46b62a
unverified

Didzis Gosko commited on Feb 11, 2024

examples : initialize context params properly (#1852)

3443ee7
unverified

ggerganov HF Staff commited on Feb 11, 2024

talk-llama : sync llama.cpp

e6d6e1d
unverified

ggerganov HF Staff commited on Feb 10, 2024

sync : ggml

94800c5
unverified

ggerganov HF Staff commited on Feb 10, 2024

src : relocate new backend sources

44cd2d4
unverified

ggerganov HF Staff commited on Feb 10, 2024

ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404)

8ebb36c
unverified

Michael Podvitskiy commited on Feb 9, 2024

CUDA: more warps for mmvq on NVIDIA (llama/5394)

7ab774c
unverified

JohannesGaessler commited on Feb 8, 2024

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)

3ff7660
unverified

JohannesGaessler commited on Feb 7, 2024

Basic Vulkan Multi-GPU implementation (llama/5321)

5d130aa
unverified

OccamRazor slaren commited on Feb 7, 2024

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)

7aa3216
unverified

JohannesGaessler commited on Feb 6, 2024

Slight quantization improvement for Q4_K and Q5_K (llama/5361)

e3cd020
unverified

Kawrakow

ikawrakow commited on Feb 6, 2024

CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)

ae45b38
unverified

JohannesGaessler commited on Feb 6, 2024

ggml : make use of ggml-quants.h possible in C++ code (llama/5338)

963ade6
unverified

Kawrakow

ikawrakow commited on Feb 5, 2024

ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325)

9bb2b0a
unverified

Dr. Tom Murphy VII Ph.D

ggerganov HF Staff commited on Feb 5, 2024

iq2_xxs: tune quantization (llama/5320)

11e5f6b
unverified

Kawrakow

ikawrakow commited on Feb 5, 2024

cuda : fix LLAMA_CUDA_F16 (llama/5262)

5fd8fb7
unverified

slaren commited on Feb 1, 2024

metal : add im2col F32 dst support (llama/5132)

26aec77
unverified

ggerganov HF Staff commited on Jan 31, 2024

llava : add MobileVLM support (llama/5132)

f17a416
unverified

JidongZhang-THU slaren commited on Jan 31, 2024

ggml : limit n_threads to the max n_tasks (llama/5238)

2645c33
unverified

slaren commited on Jan 31, 2024

kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226)

0c9c434
unverified

Cebtenzzre commited on Jan 31, 2024

ggml : add abort_callback for cpu backend (ggml/725)

a8ea91b
unverified

Michael Podvitskiy commited on Feb 9, 2024

extra : update sync scripts

d99e873
unverified

ggerganov HF Staff commited on Feb 10, 2024

server : allow CORS request with authorization headers (#1850)

16a6639
unverified

Valentin Gosu commited on Feb 9, 2024

whisper.android : how to build with CLBlast (#1809)

eea7f53
unverified

lcfrs

ggerganov HF Staff commited on Feb 9, 2024

whisper : expose CUDA device setting in public API (#1840)

d13ee66
unverified

Didzis Gosko commited on Feb 9, 2024

make : add macOS deployment target option (#1839)

9c90601
unverified

Didzis Gosko commited on Feb 9, 2024

talk-llama : stream response (#1121)

2193f2b
unverified

ggerganov HF Staff commited on Feb 6, 2024

sync : ggml (#0)

fded75b
unverified

ggerganov HF Staff commited on Jan 30, 2024

ggml : fix IQ3_XXS on Metal (llama/5219)

f066321
unverified

Kawrakow

ikawrakow commited on Jan 30, 2024

sync : ggml (llama/0)

cdb7964
unverified

ggerganov HF Staff commited on Jan 30, 2024

Faster AVX2 dot product for IQ2_XS (llama/5187)

187ae44
unverified

Kawrakow

ikawrakow

PeterReid commited on Jan 30, 2024

SOTA 3-bit quants (llama/5196)

4649943
unverified

Kawrakow

ikawrakow commited on Jan 30, 2024

ggml alloc: Fix for null dereference on alloc failure (llama/5200)

8181686
unverified

Paul Tsochantaris commited on Jan 29, 2024

Nomic Vulkan backend (llama/4456)

f5fd92d
unverified

Cebtenzzre niansa

manyoso

apage43 ToKiNoBug

ggerganov HF Staff slaren commited on Jan 29, 2024

ggml : add max buffer sizes to opencl and metal backends (llama/5181)

3d354d0
unverified

slaren commited on Jan 29, 2024

metal : free metal objects (llama/5161)

ea7167a
unverified

Paul Tsochantaris commited on Jan 28, 2024

gguf : fix comparison (ggml/715)

80cfca4
unverified

ggerganov HF Staff commited on Jan 29, 2024

`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)

75d438c
unverified

John Balis slaren commited on Jan 29, 2024

Commit History

sync : ggml f0a0087 unverified

ggml-alloc : allocate all leafs as if they were inputs (ggml/731) a512417 unverified

talk-llama : sync llama.cpp aa42df9 unverified

sync : ggml be7d266 unverified

ggml-backend : sync remnant 3f5165f unverified

CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434) c0cfa9b unverified

vulkan: only use M-sized matmul on Apple GPUs (llama/5412) 350284e unverified

ggml : fix compile warnings (unused vars) (llama/4966) 97fa2e3 unverified

ggml : add mmla kernels for quantized GEMM (llama/4966) 0d50a29 unverified

metal : use autoreleasepool to avoid memory leaks (llama/5437) c276f12 unverified

ggml-alloc : v3 (ggml/727) 5cffd6f unverified

examples : added audio_ctx argument to main and server (#1857) 469988b unverified

metal : option to embed MSL source into compiled binary (#1842) a46b62a unverified

examples : initialize context params properly (#1852) 3443ee7 unverified

talk-llama : sync llama.cpp e6d6e1d unverified

sync : ggml 94800c5 unverified

src : relocate new backend sources 44cd2d4 unverified

ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404) 8ebb36c unverified

CUDA: more warps for mmvq on NVIDIA (llama/5394) 7ab774c unverified

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386) 3ff7660 unverified

Basic Vulkan Multi-GPU implementation (llama/5321) 5d130aa unverified

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370) 7aa3216 unverified

Slight quantization improvement for Q4_K and Q5_K (llama/5361) e3cd020 unverified

CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351) ae45b38 unverified

ggml : make use of ggml-quants.h possible in C++ code (llama/5338) 963ade6 unverified

ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325) 9bb2b0a unverified

iq2_xxs: tune quantization (llama/5320) 11e5f6b unverified

cuda : fix LLAMA_CUDA_F16 (llama/5262) 5fd8fb7 unverified

metal : add im2col F32 dst support (llama/5132) 26aec77 unverified

llava : add MobileVLM support (llama/5132) f17a416 unverified

ggml : limit n_threads to the max n_tasks (llama/5238) 2645c33 unverified

kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226) 0c9c434 unverified

ggml : add abort_callback for cpu backend (ggml/725) a8ea91b unverified

extra : update sync scripts d99e873 unverified

server : allow CORS request with authorization headers (#1850) 16a6639 unverified

whisper.android : how to build with CLBlast (#1809) eea7f53 unverified

whisper : expose CUDA device setting in public API (#1840) d13ee66 unverified

make : add macOS deployment target option (#1839) 9c90601 unverified

talk-llama : stream response (#1121) 2193f2b unverified

sync : ggml (#0) fded75b unverified

ggml : fix IQ3_XXS on Metal (llama/5219) f066321 unverified

sync : ggml (llama/0) cdb7964 unverified

Faster AVX2 dot product for IQ2_XS (llama/5187) 187ae44 unverified

SOTA 3-bit quants (llama/5196) 4649943 unverified

ggml alloc: Fix for null dereference on alloc failure (llama/5200) 8181686 unverified

Nomic Vulkan backend (llama/4456) f5fd92d unverified

ggml : add max buffer sizes to opencl and metal backends (llama/5181) 3d354d0 unverified

metal : free metal objects (llama/5161) ea7167a unverified

gguf : fix comparison (ggml/715) 80cfca4 unverified

`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686) 75d438c unverified

sync : ggml

f0a0087
unverified

ggml-alloc : allocate all leafs as if they were inputs (ggml/731)

a512417
unverified

talk-llama : sync llama.cpp

aa42df9
unverified

sync : ggml

be7d266
unverified

ggml-backend : sync remnant

3f5165f
unverified

CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)

c0cfa9b
unverified

vulkan: only use M-sized matmul on Apple GPUs (llama/5412)

350284e
unverified

ggml : fix compile warnings (unused vars) (llama/4966)

97fa2e3
unverified

ggml : add mmla kernels for quantized GEMM (llama/4966)

0d50a29
unverified

metal : use autoreleasepool to avoid memory leaks (llama/5437)

c276f12
unverified

ggml-alloc : v3 (ggml/727)

5cffd6f
unverified

examples : added audio_ctx argument to main and server (#1857)

469988b
unverified

metal : option to embed MSL source into compiled binary (#1842)

a46b62a
unverified

examples : initialize context params properly (#1852)

3443ee7
unverified

talk-llama : sync llama.cpp

e6d6e1d
unverified

sync : ggml

94800c5
unverified

src : relocate new backend sources

44cd2d4
unverified

ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404)

8ebb36c
unverified

CUDA: more warps for mmvq on NVIDIA (llama/5394)

7ab774c
unverified

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)

3ff7660
unverified

Basic Vulkan Multi-GPU implementation (llama/5321)

5d130aa
unverified

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)

7aa3216
unverified

Slight quantization improvement for Q4_K and Q5_K (llama/5361)

e3cd020
unverified

CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)

ae45b38
unverified

ggml : make use of ggml-quants.h possible in C++ code (llama/5338)

963ade6
unverified

ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325)

9bb2b0a
unverified

iq2_xxs: tune quantization (llama/5320)

11e5f6b
unverified

cuda : fix LLAMA_CUDA_F16 (llama/5262)

5fd8fb7
unverified

metal : add im2col F32 dst support (llama/5132)

26aec77
unverified

llava : add MobileVLM support (llama/5132)

f17a416
unverified

ggml : limit n_threads to the max n_tasks (llama/5238)

2645c33
unverified

kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226)

0c9c434
unverified

ggml : add abort_callback for cpu backend (ggml/725)

a8ea91b
unverified

extra : update sync scripts

d99e873
unverified

server : allow CORS request with authorization headers (#1850)

16a6639
unverified

whisper.android : how to build with CLBlast (#1809)

eea7f53
unverified

whisper : expose CUDA device setting in public API (#1840)

d13ee66
unverified

make : add macOS deployment target option (#1839)

9c90601
unverified

talk-llama : stream response (#1121)

2193f2b
unverified

sync : ggml (#0)

fded75b
unverified

ggml : fix IQ3_XXS on Metal (llama/5219)

f066321
unverified

sync : ggml (llama/0)

cdb7964
unverified

Faster AVX2 dot product for IQ2_XS (llama/5187)

187ae44
unverified

SOTA 3-bit quants (llama/5196)

4649943
unverified

ggml alloc: Fix for null dereference on alloc failure (llama/5200)

8181686
unverified

Nomic Vulkan backend (llama/4456)

f5fd92d
unverified

ggml : add max buffer sizes to opencl and metal backends (llama/5181)

3d354d0
unverified

metal : free metal objects (llama/5161)

ea7167a
unverified

gguf : fix comparison (ggml/715)

80cfca4
unverified

`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)

75d438c
unverified