Georgi Gerganov
c30e34cdba
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-29 15:01:26 +02:00
Georgi Gerganov
918885697e
llama : resolve rwkv conflict
...
ggml-ci
2025-01-29 14:45:04 +02:00
Eric Curtin
f0d4b29edf
Parse https://ollama.com/library/ syntax ( #11480 )
...
People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4585
2025-01-29 11:23:10 +00:00
Georgi Gerganov
815857791d
sync : ggml
2025-01-29 11:25:29 +02:00
William Tambellini
1a0e87d291
ggml : add option to not print stack on abort (ggml/1081)
...
* Add option to not print stack on abort
Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.
* Update ggml/src/ggml.c
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
b4583
2025-01-29 11:24:53 +02:00
issixx
d2e518e9b4
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)
...
some threads kept looping and failed to terminate properly after an abort during CPU execution.
Co-authored-by: issi <issi@gmail.com >
2025-01-29 11:24:51 +02:00
Daniel Bevenius
b636228c0a
embedding : enable --no-warmup option ( #11475 )
...
This commit enables the `--no-warmup` option for the llama-embeddings.
The motivation for this change is to allow the user to disable the
warmup when running the the program.
b4581
2025-01-29 10:38:54 +02:00
Molly Sophia
325afb370a
llama: fix missing k_cache store for rwkv6qwen2 ( #11445 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
b4580
2025-01-29 12:07:21 +08:00
Emreerdog
794fe23f29
cmake: add hints for locating ggml on Windows using Llama find-package ( #11466 )
2025-01-28 19:22:06 -04:00
peidaqi
cf8cc856d7
server : Fixed wrong function name in llamacpp server unit test ( #11473 )
...
The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen
d0c08040b6
ci : fix build CPU arm64 ( #11472 )
...
* ci : fix build CPU arm64
* failed, trying ubuntu 22
* vulkan: ubuntu 24
* vulkan : jammy --> noble
2025-01-29 00:02:56 +01:00
uvos
be5ef7963f
HIP: Supress transformation warning in softmax.cu
...
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
b4576
2025-01-28 23:06:32 +01:00
Nikita Sarychev
cae9fb4361
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug ( #11080 )
...
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
b4575
2025-01-28 16:42:20 +01:00
Eric Curtin
7fee2889e6
Add github protocol pulling and http:// ( #11465 )
...
As pulling protocols to llama-run
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4574
2025-01-28 14:45:41 +00:00
Nuno
d7d1eccacc
docker: allow installing pip packages system-wide ( #11437 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu >
2025-01-28 14:17:25 +00:00
someone13574
4bf3119d61
cmake : don't fail on GGML_CPU=OFF
( #11457 )
b4572
2025-01-28 15:15:34 +01:00
Nuno
f643120bad
docker: add perplexity and bench commands to full image ( #11438 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu >
2025-01-28 10:42:32 +00:00
Akarshan Biswas
6e84b0ab8e
SYCL : SOFTMAX F16 mask support and other fixes ( #11261 )
...
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021 .
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
b4570
2025-01-28 09:56:58 +00:00
Michael Engel
2b8525d5c8
Handle missing model in CLI parameters for llama-run ( #11399 )
...
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.
Signed-off-by: Michael Engel <mengel@redhat.com >
b4569
2025-01-28 08:32:40 +00:00
Eric Curtin
a4417ddda9
Add new hf protocol for ollama ( #11449 )
...
https://huggingface.co/docs/hub/en/ollama
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4568
2025-01-27 19:36:10 +01:00
Haus1
d6d24cd9ed
AMD: parse the architecture as supplied by gcnArchName ( #11244 )
...
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
b4567
2025-01-27 14:58:17 +01:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
b4566
2025-01-27 14:42:09 +01:00
Georgi Gerganov
e665b57fa2
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-27 14:09:22 +02:00
Johannes Gäßler
df984e0147
llama: refactor llama_decode_impl ( #11381 )
b4565
2025-01-27 12:07:12 +01:00
Ihar Hrachyshka
acd38efee3
metal: Handle null returned from MTLCreateSystemDefaultDevice() ( #11441 )
...
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
b4564
2025-01-27 09:41:59 +02:00
Xuan Son Nguyen
caf773f249
docker : fix ARM build and Vulkan build ( #11434 )
...
* ci : do not fail-fast for docker
* build arm64/amd64 separatedly
* fix pip
* no fast fail
* vulkan: try jammy
2025-01-26 22:45:32 +01:00
Georgi Gerganov
a0c500b4dc
context : prepare for abstraction
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
99422dfa3f
context : introduce llama_batch_manager
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
cb8f2095c6
wip
2025-01-26 20:16:22 +02:00
Georgi Gerganov
133ad6a723
context : initial need_reserve logic
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
c75ba6851e
context : move adapter code in the implementation [no ci]
2025-01-26 20:16:22 +02:00
Georgi Gerganov
f0713498fd
context : add get_ctx_padding()
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
b4ec1d4429
cont : move kv_self update to llama_context
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
f2524c0e41
llama : remove references to llama_kv_cache (wip)
...
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
ae274f9747
llama : fix names [no ci]
2025-01-26 20:16:21 +02:00
Georgi Gerganov
a19f671fe0
context : minor
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
17b363afd3
llama : update llama_kv_self API
...
ggml-ci
2025-01-26 20:16:20 +02:00
Georgi Gerganov
fd05ab87aa
kv_cache : move state read/write to llama_kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4cd1b6fa4c
context : prepare kv_cache_read/write to be moved to kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
73a14eccc9
kv_cache : minor
2025-01-26 20:14:36 +02:00
Georgi Gerganov
fef90cb3d7
kv_cache : fix
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4d7bd03e65
kv_cache : functions -> members
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
e4550fbafc
llama : cont
...
ggml-ci
2025-01-26 20:14:35 +02:00
Georgi Gerganov
f78b396ee7
llama : add struct llama_kv_cache (wip) [no ci]
2025-01-26 20:12:06 +02:00
Georgi Gerganov
178a7eb952
metal : use residency sets ( #11427 )
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
b4562
2025-01-26 20:06:16 +02:00
Nuno
6f53d8a6b4
docker: add missing vulkan library to base layer and update to 24.04 ( #11422 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu >
2025-01-26 18:22:43 +01:00
bandoti
19f65187cb
cmake: add ggml find package ( #11369 )
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b4560
2025-01-26 12:07:48 -04:00
Frank Mai
1d8ee06000
rpc: fix register position ( #11424 )
...
Signed-off-by: thxCode <thxcode0824@gmail.com >
b4559
2025-01-26 16:20:34 +01:00
Georgi Gerganov
2cc9b8c32c
readme : update hot topics
2025-01-26 14:30:15 +02:00
Jeff Bolz
f35726c2fb
build: apply MSVC /bigobj option to c/cpp files only ( #11423 )
b4557
2025-01-26 03:10:03 +01:00