Files
llama.cpp/tests
Francis Couture-Harpin 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN
This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks
2024-11-04 13:29:47 -05:00
..
2024-03-09 14:17:11 +02:00
2024-01-29 15:50:50 -05:00
2024-08-30 01:20:53 +02:00