llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-01 21:04:58 -04:00

Files

Xuan-Son Nguyen 64eda5deb9 convert : ability to lazy-load safetensors remotely without downloading to disk (#12820 )

* gguf util : add SafetensorRemote

* fix style

* convert: add --remote option

* convert : allow using lazy remote tensors

It's a bit slow for now since everything is blocking and single-threaded.

* correct metadata.name

* small style fix

* support HF_TOKEN

* convert : use writeable buffer for remote lazy tensors

* convert : fix flake8 lint regarding lamdba assigment

* multithreaded download

* multithread: print debug

* fix style

* Revert "multithreaded download"

This reverts commit 42fc895ace.

* bring back _get_request_headers

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>

2025-04-10 17:24:44 +02:00

scripts

Refactor gguf scripts to improve metadata handling (#11909 )

2025-02-26 08:04:48 -05:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

llama : Support Qwen3 and Qwen3MoE (#12828 )

2025-04-09 11:47:36 +02:00

gguf_reader.py

Refactor gguf scripts to improve metadata handling (#11909 )