llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-27 19:53:42 -04:00

Files

slaren 1123f7fbdf ggml-cuda : use graph allocator (#2684 )

use a different function for no_alloc to avoid breaking backwards compat, fixes lora

remove 512 n_batch limit

fixed 2048 batch size

cleanup

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2023-08-22 15:25:19 +02:00

CMakeLists.txt

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

common.cpp

ggml-cuda : use graph allocator (#2684 )

2023-08-22 15:25:19 +02:00

common.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

console.cpp

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

console.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

grammar-parser.cpp

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

grammar-parser.h

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00