llama.cpp

batched

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

batched-bench

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 )

2024-10-18 23:18:01 +02:00

batched.swift

llama : llama_perf + option to disable timings during decode (#9355 )

2024-09-13 09:53:38 +03:00

convert-llama2c-to-ggml

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

cvector-generator

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 )

2024-10-18 23:18:01 +02:00

deprecation-warning

examples : remove finetune and train-text-from-scratch (#8669 )

2024-07-25 10:39:04 +02:00

embedding

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

eval-callback

ggml : add support for dynamic loading of backends (#10469 )

2024-11-25 15:13:39 +01:00

export-lora

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

gbnf-validator

llama : refactor sampling v2 (#9294 )

2024-09-07 15:16:19 +03:00

gen-docs

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

gguf

gguf : handle null name during init (#8587 )

2024-07-20 17:15:42 +03:00

gguf-hash

gguf-hash : update clib.json to point to original xxhash repo (#8491 )

2024-07-16 10:14:16 +03:00

gguf-split

gguf-split : improve --split and --merge logic (#9619 )

2024-10-02 10:21:57 +03:00

gritlm

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

imatrix

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 )

2024-10-18 23:18:01 +02:00

infill

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

jeopardy

…

llama-bench

ggml : add support for dynamic loading of backends (#10469 )

2024-11-25 15:13:39 +01:00

llama.android

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 )

2024-10-18 23:18:01 +02:00

llama.swiftui

llama : default sampling changes + greedy update (#9897 )

2024-10-21 09:46:40 +03:00

llava

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

lookahead

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

lookup

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

main

ggml : add support for dynamic loading of backends (#10469 )

2024-11-25 15:13:39 +01:00

main-cmake-pkg

…

parallel

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

passkey

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

perplexity

llama/ex: remove --logdir argument (#10339 )

2024-11-16 23:00:41 +01:00

quantize

quantize : improve type name parsing (#9570 )

2024-09-20 20:55:36 +02:00

quantize-stats

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

retrieval

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

rpc

ggml : move CPU backend to a separate file (#10144 )

2024-11-03 19:34:08 +01:00

run

Introduce llama-run (#10291 )

2024-11-25 22:56:24 +01:00

save-load-state

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

server

server : fix parallel speculative decoding (#10513 )

2024-11-26 13:36:40 +02:00

simple

ggml : add support for dynamic loading of backends (#10469 )

2024-11-25 15:13:39 +01:00

simple-chat

ggml : add support for dynamic loading of backends (#10469 )

2024-11-25 15:13:39 +01:00

speculative

llama : accept a list of devices to use to offload a model (#10497 )

2024-11-25 19:30:06 +01:00

speculative-simple

speculative : simplify the implementation (#10504 )

2024-11-26 12:29:38 +02:00

sycl

[SYCL]set context default value to avoid memory issue, update guide (#9476 )

2024-09-18 08:30:31 +08:00

tokenize

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

base-translate.sh

…

chat-13B.bat

…

chat-13B.sh

…

chat-persistent.sh

scripts : fix pattern and get n_tokens in one go (#10221 )

2024-11-09 09:06:54 +02:00

chat-vicuna.sh

…

chat.sh

…

CMakeLists.txt

Introduce llama-run (#10291 )

2024-11-25 22:56:24 +01:00

convert_legacy_llama.py

metadata: Detailed Dataset Authorship Metadata (#8875 )

2024-11-13 21:10:38 +11:00

json_schema_pydantic_example.py

…

json_schema_to_grammar.py

grammar : fix JSON Schema for string regex with top-level alt. (#9903 )

2024-10-16 19:03:24 +03:00

llama.vim

llama.vim : bump generation time limit to 3s [no ci]

2024-10-23 17:16:56 +03:00

llm.vim

…

Miku.sh

…

pydantic_models_to_grammar_examples.py

examples : Rewrite pydantic_models_to_grammar_examples.py (#8493 )

2024-07-20 22:09:17 -04:00

pydantic_models_to_grammar.py

…

reason-act.sh

…

regex_to_grammar.py

…

server_embd.py

…

server-llama2-13B.sh

…

ts-type-to-grammar.sh

…