Commit Graph

4580 Commits

Author SHA1 Message Date
Molly Sophia
325afb370a llama: fix missing k_cache store for rwkv6qwen2 (#11445)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b4580
2025-01-29 12:07:21 +08:00
Emreerdog
794fe23f29 cmake: add hints for locating ggml on Windows using Llama find-package (#11466) 2025-01-28 19:22:06 -04:00
peidaqi
cf8cc856d7 server : Fixed wrong function name in llamacpp server unit test (#11473)
The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen
d0c08040b6 ci : fix build CPU arm64 (#11472)
* ci : fix build CPU arm64

* failed, trying ubuntu 22

* vulkan: ubuntu 24

* vulkan : jammy --> noble
2025-01-29 00:02:56 +01:00
uvos
be5ef7963f HIP: Supress transformation warning in softmax.cu
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
b4576
2025-01-28 23:06:32 +01:00
Nikita Sarychev
cae9fb4361 HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080)
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
b4575
2025-01-28 16:42:20 +01:00
Eric Curtin
7fee2889e6 Add github protocol pulling and http:// (#11465)
As pulling protocols to llama-run

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4574
2025-01-28 14:45:41 +00:00
Nuno
d7d1eccacc docker: allow installing pip packages system-wide (#11437)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 14:17:25 +00:00
someone13574
4bf3119d61 cmake : don't fail on GGML_CPU=OFF (#11457) b4572 2025-01-28 15:15:34 +01:00
Nuno
f643120bad docker: add perplexity and bench commands to full image (#11438)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 10:42:32 +00:00
Akarshan Biswas
6e84b0ab8e SYCL : SOFTMAX F16 mask support and other fixes (#11261)
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases
b4570
2025-01-28 09:56:58 +00:00
Michael Engel
2b8525d5c8 Handle missing model in CLI parameters for llama-run (#11399)
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.

Signed-off-by: Michael Engel <mengel@redhat.com>
b4569
2025-01-28 08:32:40 +00:00
Eric Curtin
a4417ddda9 Add new hf protocol for ollama (#11449)
https://huggingface.co/docs/hub/en/ollama

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4568
2025-01-27 19:36:10 +01:00
Haus1
d6d24cd9ed AMD: parse the architecture as supplied by gcnArchName (#11244)
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
b4567
2025-01-27 14:58:17 +01:00
lexasub
a5203b4465 llama : minor fixes for up llama load model speed (#11448)
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%

* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings

* Update src/llama-vocab.cpp

---------

Co-authored-by: lexasub <empty@empty.ru>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
b4566
2025-01-27 14:42:09 +01:00
Johannes Gäßler
df984e0147 llama: refactor llama_decode_impl (#11381) b4565 2025-01-27 12:07:12 +01:00
Ihar Hrachyshka
acd38efee3 metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441)
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
b4564
2025-01-27 09:41:59 +02:00
Xuan Son Nguyen
caf773f249 docker : fix ARM build and Vulkan build (#11434)
* ci : do not fail-fast for docker

* build arm64/amd64 separatedly

* fix pip

* no fast fail

* vulkan: try jammy
2025-01-26 22:45:32 +01:00
Georgi Gerganov
178a7eb952 metal : use residency sets (#11427)
* metal : use residency sets

ggml-ci

* metal : restore commandBufferWithUnretainedReferences calls [no ci]

* metal : release descriptors

ggml-ci

* metal : check env GGML_METAL_NO_RESIDENCY

ggml-ci

* metal : fix build + clean-up

ggml-ci
b4562
2025-01-26 20:06:16 +02:00
Nuno
6f53d8a6b4 docker: add missing vulkan library to base layer and update to 24.04 (#11422)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-26 18:22:43 +01:00
bandoti
19f65187cb cmake: add ggml find package (#11369)
* Add initial ggml cmake package

* Add build numbers to ggml find-package

* Expand variables with GGML_ prefix

* Guard against adding to cache variable twice

* Add git to msys2 workflow

* Handle ggml-cpu-* variants

* Link ggml/ggml-base libraries to their targets

* Replace main-cmake-pkg with simple-cmake-pkg

* Interface features require c_std_90

* Fix typo

* Removed unnecessary bracket from status message

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4560
2025-01-26 12:07:48 -04:00
Frank Mai
1d8ee06000 rpc: fix register position (#11424)
Signed-off-by: thxCode <thxcode0824@gmail.com>
b4559
2025-01-26 16:20:34 +01:00
Georgi Gerganov
2cc9b8c32c readme : update hot topics 2025-01-26 14:30:15 +02:00
Jeff Bolz
f35726c2fb build: apply MSVC /bigobj option to c/cpp files only (#11423) b4557 2025-01-26 03:10:03 +01:00
Jeff Bolz
4a75d19376 vulkan: compile shaders on-demand (#11406)
Reduce first-run startup time and memory consumption.

Should fix #11339.
2025-01-25 22:29:57 +01:00
uvos
26771a1491 Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) 2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1 build: add /bigobj to MSVC build (#11407) 2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) 2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4 server : fix cleaning up stream task (#11418)
* server : fix cleaning up stream task

* one more spot
b4552
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b docker : fix CPU ARM build (#11403)
* docker : fix CPU ARM build

* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a ci : fix line breaks on windows builds (#11409)
* ci : fix line breaks on windows builds

* cont : another try

* ci : fix powershell line breaks
b4550
2025-01-25 13:36:48 +02:00
jiahao su
466ea66f33 CANN: Add Ascend CANN build ci (#10217)
* CANN: Add Ascend CANN build ci

* Update build.yml

* Modify cann image version

* Update build.yml

* Change to run on x86 system

* Update build.yml

* Update build.yml

* Modify format error

* Update build.yml

* Add 'Ascend NPU' label restrictions

* Exclude non PR event

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>

* Update build.yml

---------

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
b4549
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f hip : Add hipGraph and VMM support to ROCM (#11362)
* Add hipGraph support

* Enable VMM on rocm
b4548
2025-01-25 00:02:23 +01:00
Johannes Gäßler
c5d9effb49 CUDA: fix FP16 cuBLAS GEMM (#11396) b4547 2025-01-24 21:02:43 +01:00
uvos
9fbadaef4f rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) b4546 2025-01-24 17:50:49 +01:00
Georgi Gerganov
9755129c27 release : pack /lib in the packages (#11392)
* release : pack /lib and /include in the packages

* cmake : put libs in /bin

* TMP : push artifacts

* Revert "TMP : push artifacts"

This reverts commit 4decf2c4df.

* ci : fix HIP cmake compiler options to be on first line

* ci : restore the original HIP commands

* ci : change ubuntu build from latest to 20.04

* ci : try to fix macos build rpaths

* ci : remove obsolete MacOS build

* TMP : push artifacts

* ci : change back to ubuntu latest

* ci : macos set build rpath to "@loader_path"

* ci : fix typo

* ci : change ubuntu package to 22.04

* Revert "TMP : push artifacts"

This reverts commit 537b09e70f.
b4545
2025-01-24 18:41:30 +02:00
Jafar Uruç
a07c2c8a52 docs : Update readme to build targets for local docker build (#11368) 2025-01-24 14:30:13 +01:00
Johannes Gäßler
8137b4bb2b CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) b4543 2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann
1af6945eb0 cmake : avoid -march=native when reproducible build is wanted (#11366)
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317

This patch was done while working on reproducible builds for openSUSE.
b4542
2025-01-24 13:21:35 +02:00
Eric Curtin
01f37edf1a Update llama-run README.md (#11386)
For consistency

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364)
* webui : put DeepSeek R1 CoT in a collapsible <details> element

* webui: refactor split

* webui: don't use regex to split cot and response

* webui: format+qol

* webui: no loading icon if the model isn't generating

* ui fix, add configs

* add jsdoc types

* only filter </think> for assistant msg

* build

* update build

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Jeff Bolz
564804b79b tests: fix some mul_mat test gaps (#11375)
Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.
b4539
2025-01-23 14:51:24 -06:00
Eric Curtin
05f63cc9ee Update documentation (#11373)
To show -n, -ngl, --ngl is acceptable.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4538
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b Add -ngl (#11372)
Most other llama.cpp cli tools accept -ngl with a single dash.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4537
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640 server : add more clean up when cancel_tasks is called (#11340)
* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if
b4536
2025-01-23 13:56:05 +01:00
Eric Curtin
f211d1dc10 Treat hf.co/ prefix the same as hf:// (#11350)
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://

Treat them similarly.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4535
2025-01-23 10:38:20 +00:00
amd-dwang
955a6c2d91 Vulkan-run-test: fix mmq_wg_denoms (#11343)
There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
b4534
2025-01-23 08:14:28 +01:00
Jeff Bolz
1971adf55e vulkan: sort shaders for more deterministic binary (#11315)
Fixes #11306.
b4533
2025-01-23 08:07:50 +01:00
Jeff Bolz
5245729e33 vulkan: fix diag_mask_inf (#11323)
With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.
b4532
2025-01-23 08:01:17 +01:00
Diego Devesa
6152129d05 main : update README documentation for batch size (#11353)
* main : update README documentation for batch size

* fix formatting

* minor
2025-01-22 19:22:20 +01:00