cb8f2095c6
wip
2025-01-26 20:16:22 +02:00
133ad6a723
context : initial need_reserve logic
...
ggml-ci
2025-01-26 20:16:22 +02:00
c75ba6851e
context : move adapter code in the implementation [no ci]
2025-01-26 20:16:22 +02:00
f0713498fd
context : add get_ctx_padding()
...
ggml-ci
2025-01-26 20:16:22 +02:00
b4ec1d4429
cont : move kv_self update to llama_context
...
ggml-ci
2025-01-26 20:16:21 +02:00
f2524c0e41
llama : remove references to llama_kv_cache (wip)
...
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.
ggml-ci
2025-01-26 20:16:21 +02:00
ae274f9747
llama : fix names [no ci]
2025-01-26 20:16:21 +02:00
a19f671fe0
context : minor
...
ggml-ci
2025-01-26 20:16:21 +02:00
17b363afd3
llama : update llama_kv_self API
...
ggml-ci
2025-01-26 20:16:20 +02:00
fd05ab87aa
kv_cache : move state read/write to llama_kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
4cd1b6fa4c
context : prepare kv_cache_read/write to be moved to kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
73a14eccc9
kv_cache : minor
2025-01-26 20:14:36 +02:00
fef90cb3d7
kv_cache : fix
...
ggml-ci
2025-01-26 20:14:36 +02:00
4d7bd03e65
kv_cache : functions -> members
...
ggml-ci
2025-01-26 20:14:36 +02:00
e4550fbafc
llama : cont
...
ggml-ci
2025-01-26 20:14:35 +02:00
f78b396ee7
llama : add struct llama_kv_cache (wip) [no ci]
2025-01-26 20:12:06 +02:00
178a7eb952
metal : use residency sets ( #11427 )
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
b4562
2025-01-26 20:06:16 +02:00
6f53d8a6b4
docker: add missing vulkan library to base layer and update to 24.04 ( #11422 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu >
2025-01-26 18:22:43 +01:00
19f65187cb
cmake: add ggml find package ( #11369 )
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b4560
2025-01-26 12:07:48 -04:00
1d8ee06000
rpc: fix register position ( #11424 )
...
Signed-off-by: thxCode <thxcode0824@gmail.com >
b4559
2025-01-26 16:20:34 +01:00
2cc9b8c32c
readme : update hot topics
2025-01-26 14:30:15 +02:00
f35726c2fb
build: apply MSVC /bigobj option to c/cpp files only ( #11423 )
b4557
2025-01-26 03:10:03 +01:00
4a75d19376
vulkan: compile shaders on-demand ( #11406 )
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-01-25 22:29:57 +01:00
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations ( #11420 )
2025-01-25 21:01:12 +01:00
ca6baf76c1
build: add /bigobj to MSVC build ( #11407 )
2025-01-25 11:26:37 -06:00
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for ( #11419 )
2025-01-25 17:22:41 +01:00
49b0e3cec4
server : fix cleaning up stream task ( #11418 )
...
* server : fix cleaning up stream task
* one more spot
b4552
2025-01-25 16:36:44 +01:00
20a758155b
docker : fix CPU ARM build ( #11403 )
...
* docker : fix CPU ARM build
* add CURL to other builds
2025-01-25 15:22:29 +01:00
00c24acb2a
ci : fix line breaks on windows builds ( #11409 )
...
* ci : fix line breaks on windows builds
* cont : another try
* ci : fix powershell line breaks
b4550
2025-01-25 13:36:48 +02:00
466ea66f33
CANN: Add Ascend CANN build ci ( #10217 )
...
* CANN: Add Ascend CANN build ci
* Update build.yml
* Modify cann image version
* Update build.yml
* Change to run on x86 system
* Update build.yml
* Update build.yml
* Modify format error
* Update build.yml
* Add 'Ascend NPU' label restrictions
* Exclude non PR event
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org >
* Update build.yml
---------
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org >
b4549
2025-01-25 00:26:01 +01:00
5f0db9522f
hip : Add hipGraph and VMM support to ROCM ( #11362 )
...
* Add hipGraph support
* Enable VMM on rocm
b4548
2025-01-25 00:02:23 +01:00
c5d9effb49
CUDA: fix FP16 cuBLAS GEMM ( #11396 )
b4547
2025-01-24 21:02:43 +01:00
9fbadaef4f
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna ( #11356 )
b4546
2025-01-24 17:50:49 +01:00
9755129c27
release : pack /lib in the packages ( #11392 )
...
* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4df
.
* ci : fix HIP cmake compiler options to be on first line
* ci : restore the original HIP commands
* ci : change ubuntu build from latest to 20.04
* ci : try to fix macos build rpaths
* ci : remove obsolete MacOS build
* TMP : push artifacts
* ci : change back to ubuntu latest
* ci : macos set build rpath to "@loader_path"
* ci : fix typo
* ci : change ubuntu package to 22.04
* Revert "TMP : push artifacts"
This reverts commit 537b09e70f
.
b4545
2025-01-24 18:41:30 +02:00
a07c2c8a52
docs : Update readme to build targets for local docker build ( #11368 )
2025-01-24 14:30:13 +01:00
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )
b4543
2025-01-24 12:38:31 +01:00
1af6945eb0
cmake : avoid -march=native when reproducible build is wanted ( #11366 )
...
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.
Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.
Fixes : #11317
This patch was done while working on reproducible builds for openSUSE.
b4542
2025-01-24 13:21:35 +02:00
01f37edf1a
Update llama-run README.md ( #11386 )
...
For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
2025-01-24 09:39:24 +00:00
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )
...
* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-01-24 09:02:38 +01:00
564804b79b
tests: fix some mul_mat test gaps ( #11375 )
...
Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.
b4539
2025-01-23 14:51:24 -06:00
05f63cc9ee
Update documentation ( #11373 )
...
To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4538
2025-01-23 20:04:31 +00:00
f7fb43cd0b
Add -ngl ( #11372 )
...
Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4537
2025-01-23 16:16:18 +00:00
5845661640
server : add more clean up when cancel_tasks is called ( #11340 )
...
* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if
b4536
2025-01-23 13:56:05 +01:00
f211d1dc10
Treat hf.co/ prefix the same as hf:// ( #11350 )
...
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com >
b4535
2025-01-23 10:38:20 +00:00
955a6c2d91
Vulkan-run-test: fix mmq_wg_denoms ( #11343 )
...
There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
b4534
2025-01-23 08:14:28 +01:00
1971adf55e
vulkan: sort shaders for more deterministic binary ( #11315 )
...
Fixes #11306 .
b4533
2025-01-23 08:07:50 +01:00
5245729e33
vulkan: fix diag_mask_inf ( #11323 )
...
With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.
b4532
2025-01-23 08:01:17 +01:00
6152129d05
main : update README documentation for batch size ( #11353 )
...
* main : update README documentation for batch size
* fix formatting
* minor
2025-01-22 19:22:20 +01:00
16d3df7ab0
readme : add plugin links ( #11355 )
2025-01-22 19:44:26 +02:00
12c2bdf2de
server : fix draft context not being released ( #11354 )
b4529
2025-01-22 17:44:40 +01:00