llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-17 13:40:55 -04:00

Files

Francis Couture-Harpin 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks

2024-11-04 13:29:47 -05:00

cmake

…

include

ggml : avoid multiply by D in GGML_OP_SSM_SCAN

2024-11-04 13:29:47 -05:00

src

ggml : avoid multiply by D in GGML_OP_SSM_SCAN

2024-11-04 13:29:47 -05:00

.gitignore

…

CMakeLists.txt

add amx kernel for gemm (#8998 )

2024-10-18 13:34:36 +08:00