llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-15 20:53:00 -04:00

Files

amritahs-ibm c7b43ab608 llamafile : ppc64le MMA implementation for Q4_0. (#12489 )

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

2025-03-27 08:51:47 +02:00

cmake

cmake : enable building llama.cpp using system libggml (#12321 )

2025-03-17 11:05:23 +02:00

include

llama: Add support for RWKV v7 architecture (#12412 )

2025-03-18 07:27:50 +08:00

src

llamafile : ppc64le MMA implementation for Q4_0. (#12489 )

2025-03-27 08:51:47 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : riscv: add 128-bit RVV support (#12530 )

2025-03-27 08:38:34 +02:00