llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-08-22 07:24:06 -04:00

Files

Douglas Hanley 03bf161eb6 llama : support batched embeddings (#5466 )

* batched embedding: pool outputs by sequence id. updated embedding example

* bring back non-causal attention

* embd : minor improvements

* llama : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2024-02-13 14:06:58 +02:00

__init__.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

constants.py

llama : support batched embeddings (#5466 )

2024-02-13 14:06:58 +02:00

gguf_reader.py

gguf : fix "general.alignment" type in gguf_reader.py (#5136 )

2024-01-26 11:10:28 +02:00

gguf_writer.py

llama : support batched embeddings (#5466 )

2024-02-13 14:06:58 +02:00

gguf.py

gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )

2023-11-11 08:04:50 +03:00

py.typed

convert : various script cleanups/fixes + merges and special token handling (#2842 )

2023-08-30 11:25:50 +03:00

tensor_mapping.py

Add support for BERT embedding models (#5423 )

2024-02-11 11:21:38 -05:00

vocab.py

py : open merges file as 'utf-8' (#4566 )

2023-12-21 19:07:34 +02:00