ggml : add Flash Attention (llama/5021) 34d3b03 ggerganov HF Staff JohannesGaessler phymbert commited on Apr 30, 2024
Fix more int overflow during quant (PPL/CUDA). (llama/6563) 531387f dranger003 commited on Apr 28, 2024