llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-26 19:55:04 +00:00

Author	SHA1	Message	Date
Christian Kastner	532802f938	Implement GGML_CPU_ALL_VARIANTS for ARM (#14080 ) * ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior. b5639	2025-06-11 21:07:44 +02:00
Sigbjørn Skjæret	d4e0d95cf5	chore : clean up relative source dir paths (#14128 ) b5638	2025-06-11 19:04:23 +02:00
Sigbjørn Skjæret	cc66a7f78f	tests : add test-tokenizers-repo (#14017 ) b5637	2025-06-11 17:16:32 +02:00
Jeff Bolz	bd248d4dc7	vulkan: Better thread-safety for command pools/buffers (#14116 ) This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other. b5636	2025-06-11 09:48:52 -05:00
Aman	7781e5fe99	webui: Wrap long numbers instead of infinite horizontal scroll (#14062 ) * webui: Wrap long numbers instead of infinite horizontal scroll * Use tailwind class * update index.html.gz	2025-06-11 16:42:25 +02:00
Georgi Gerganov	89a184fa71	kv-cache : relax SWA masking condition (#14119 ) ggml-ci b5634	2025-06-11 16:48:45 +03:00
Taylor	2baf07727f	server : pass default --keep argument (#14120 ) b5633	2025-06-11 13:43:43 +03:00
Georgi Gerganov	7ae2932116	kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121 ) b5632	2025-06-11 12:52:45 +03:00
Jeff Bolz	1f7d50b293	vulkan: Track descriptor pools/sets per-context (#14109 ) Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use. b5631	2025-06-11 07:19:25 +02:00
lhez	4c763c8d1b	opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003 ) b5630	2025-06-10 16:55:58 -07:00
compilade	dad5c44398	kv-cache : avoid modifying recurrent cells when setting inputs (#13834 ) * kv-cache : avoid modifying recurrent cells when setting inputs * kv-cache : remove inp_s_mask It was replaced with equivalent and simpler functionality with rs_z (the first zeroed state) and the already-existing inp_s_copy. * kv-cache : fix non-consecutive token pos warning for recurrent models The problem was apparently caused by how the tail cells were swapped. * graph : simplify logic for recurrent state copies * kv-cache : use cell without src refs for rs_z in recurrent cache * llama-graph : fix recurrent state copy The `state_copy` shuffle assumes everything is moved at once, which is not true when `states_extra` is copied back to the cache before copying the range of states between `head` and `head + n_seqs`. This is only a problem if any of the cells in [`head`, `head + n_seqs`) have an `src` in [`head + n_seqs`, `head + n_kv`), which does happen when `n_ubatch > 1` in the `llama-parallel` example. Changing the order of the operations avoids the potential overwrite before use, although when copies are avoided (like with Mamba2), this will require further changes. * llama-graph : rename n_state to state_size in build_recurrent_state This naming should reduce confusion between the state size and the number of states. b5629	2025-06-10 18:20:14 -04:00
Sigbjørn Skjæret	55f6b9fa65	convert : fix duplicate key DeepSeek-R1 conversion error (#14103 )	2025-06-10 23:29:52 +02:00
Sigbjørn Skjæret	3678b838bb	llama : support GEGLU for jina-bert-v2 (#14090 ) b5627	2025-06-10 18:02:08 +02:00
Jeff Bolz	652b70e667	vulkan: force device 0 in CI (#14106 )	2025-06-10 10:53:47 -05:00
Juk Armstrong	3a12db23b6	Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104 ) b5625	2025-06-10 16:48:07 +01:00
Georgi Gerganov	ae92c1855b	sync : ggml ggml-ci b5624	2025-06-10 18:39:33 +03:00
Georgi Gerganov	b7ce1ad1e3	ggml : fix weak alias win32 (whisper/0) ggml-ci	2025-06-10 18:39:33 +03:00
0cc4m	97340b4c99	Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099 ) b5622	2025-06-10 13:01:33 +01:00
Isaac McFadyen	2bb0467043	rpc : nicer error messages for RPC server crash (#14076 ) b5621	2025-06-10 09:41:01 +03:00
Georgi Gerganov	b8e2194efc	sync : ggml ggml-ci b5620	2025-06-10 09:21:56 +03:00
Kai Pastor	1a3b5e80f7	Add in-build ggml::ggml ALIAS library (ggml/1260) Enable uniform linking with subproject and with find_package.	2025-06-10 09:21:56 +03:00
Georgi Gerganov	1f63e75f3b	metal : use less stack memory in FA kernel (#14088 ) * metal : use less stack memory in FA kernel ggml-ci * cont : fix BF16 variant b5618	2025-06-09 23:05:02 +03:00
Georgi Gerganov	40cbf571c9	kv-cache : fix shift and defrag logic (#14081 ) * kv-cache : fix shift ggml-ci * cont : reset shift[i] ggml-ci * cont : fix defrag erasing cells that didn't move ggml-ci b5617	2025-06-09 23:04:35 +03:00
Diego Devesa	7f4fbe5183	llama : allow building all tests on windows when not using shared libs (#13980 ) * llama : allow building all tests on windows when not using shared libraries * add static windows build to ci * tests : enable debug logs for test-chat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b5616	2025-06-09 20:03:09 +02:00
xctan	f470bc36be	ggml-cpu : split arch-specific implementations (#13892 ) * move ggml-cpu-aarch64 to repack * split quantize_row_q8_0/1 * split helper functions * split ggml_vec_dot_q4_0_q8_0 * split ggml_vec_dot_q4_1_q8_1 * split ggml_vec_dot_q5_0_q8_0 * split ggml_vec_dot_q5_1_q8_1 * split ggml_vec_dot_q8_0_q8_0 * split ggml_vec_dot_tq1_0_q8_K * split ggml_vec_dot_tq2_0_q8_K * split ggml_vec_dot_q2_K_q8_K * split ggml_vec_dot_q3_K_q8_K * split ggml_vec_dot_q4_K_q8_K * split ggml_vec_dot_q5_K_q8_K * split ggml_vec_dot_q6_K_q8_K * split ggml_vec_dot_iq2_xxs_q8_K * split ggml_vec_dot_iq2_xs_q8_K * split ggml_vec_dot_iq2_s_q8_K * split ggml_vec_dot_iq3_xxs_q8_K * split ggml_vec_dot_iq3_s_q8_K * split ggml_vec_dot_iq1_s_q8_K * split ggml_vec_dot_iq1_m_q8_K * split ggml_vec_dot_iq4_nl_q8_0 * split ggml_vec_dot_iq4_xs_q8_K * fix typos * fix missing prototypes * rename ggml-cpu-quants.c * rename ggml-cpu-traits * rename arm folder * move cpu-feats-x86.cpp * rename ggml-cpu-hbm * update arm detection macro in quants.c * move iq quant tables * split ggml_quantize_mat_q8_0/K * split ggml_gemv_* * split ggml_gemm_* * rename namespace aarch64 to repack * use weak aliases to replace test macros * rename GGML_CPU_AARCH64 to GGML_CPU_REPACK * rename more aarch64 to repack * clean up rebase leftover * fix compilation errors * remove trailing spaces * try to fix clang compilation errors * try to fix clang compilation errors again * try to fix clang compilation errors, 3rd attempt * try to fix clang compilation errors, 4th attempt * try to fix clang compilation errors, 5th attempt * try to fix clang compilation errors, 6th attempt * try to fix clang compilation errors, 7th attempt * try to fix clang compilation errors, 8th attempt * try to fix clang compilation errors, 9th attempt * more cleanup * fix compilation errors * fix apple targets * fix a typo in arm version of ggml_vec_dot_q4_K_q8_K Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b5615	2025-06-09 16:47:13 +02:00
Diego Devesa	8f47e25f56	cuda : fix device sync on buffer clear (#14033 ) b5614	2025-06-09 16:36:26 +02:00
Georgi Gerganov	201b31dc2e	graph : fix geglu (#14077 ) ggml-ci b5613	2025-06-09 17:17:31 +03:00
Xinpeng Dou	e21d2d4ae2	CANN: Simplify the environment variable setting(#13104 ) * Simplify the environment variable setting to specify the memory pool type. * Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options. * update * fix CI * update * delete whitespace * fix according to review * update CANN.md * update CANN.md b5612	2025-06-09 19:47:39 +08:00
R0CKSTAR	dc0623fddb	webui: fix sidebar being covered by main content (#14082 ) * webui: fix sidebar being covered by main content Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * webui: update index.html.gz Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-09 12:01:17 +02:00
Georgi Gerganov	87d34b381d	server : fix LRU check (#14079 ) ggml-ci b5610	2025-06-09 12:57:58 +03:00
Nicolò Scipione	b460d16ae8	sycl: Add reorder to Q6_K mmvq implementation (#13885 ) * Add Reorder to Q6_K mmvq implementation * Address PR comments: clean up comments * Remove unused parameter after refactoring q4_k * Adding inline to function and removing unnecessary reference to int --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com> b5609	2025-06-09 11:47:07 +02:00
Đinh Trọng Huy	91a8ee6a6f	add geglu activation function (#14074 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> b5608	2025-06-09 05:15:31 +01:00
Yuanhao Ji	056eb74534	CANN: Enable labeler for Ascend NPU (#13914 )	2025-06-09 11:20:06 +08:00
Diego Devesa	247e5c6e44	cuda : fix buffer type check with integrated GPUs (#14069 ) b5606	2025-06-08 11:39:56 -07:00
吴小白	5787b5da57	ci: add LoongArch cross-compile build (#13944 )	2025-06-07 10:39:11 -03:00
Akarshan Biswas	228f34c9ce	SYCL: Implement few same quantized type copy kernels (#13739 ) * SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations. b5604	2025-06-07 18:58:20 +05:30
Sigbjørn Skjæret	0974ad7a7c	llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050 ) b5603	2025-06-07 14:13:12 +02:00
Georgi Gerganov	745aa5319b	llama : deprecate llama_kv_self_ API (#14030 ) * llama : deprecate llama_kv_self_ API ggml-ci * llama : allow llama_memory_(nullptr) ggml-ci * memory : add flag for optional data clear in llama_memory_clear ggml-ci b5602	2025-06-06 14:11:15 +03:00
Georgi Gerganov	487a5e0401	context : fix SWA-related warning for multiple sequences (#14045 ) b5601	2025-06-06 13:29:18 +03:00
Sigbjørn Skjæret	d17a809ef0	llama : support multiple classifier outputs and labels (#13940 ) b5600	2025-06-06 09:03:25 +02:00
Sigbjørn Skjæret	1caae7fc6c	gguf-py : add add_classifier_output_labels method to writer (#14031 ) * add add_classifier_output_labels * use add_classifier_output_labels	2025-06-05 17:42:31 +02:00
Masato Nakasaka	669c13e0f6	vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001 ) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check b5598	2025-06-05 16:00:29 +02:00
pockers21	146b88e8b3	ci: fix CUDA build failure on autodl cloud machines (#14005 ) Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments. Co-authored-by: pockers21 <liyang2@uniontech.com>	2025-06-05 16:25:29 +03:00
Georgi Gerganov	7f37b6cf1e	memory : migrate from llama_kv_cache to more generic llama_memory (#14006 ) * memory : merge llama_kv_cache into llama_memory + new `llama_memory` API ggml-ci * context : fix casts ggml-ci b5596	2025-06-05 15:29:22 +03:00
Diego Devesa	3a077146a4	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 ) b5595	2025-06-05 11:57:42 +02:00
Olexandr88	d01d112abb	readme : add badge (#13938 )	2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret	9f47fa5792	vocab : warn about missing mask token (#14022 ) b5593	2025-06-05 09:29:18 +02:00
Georgi Gerganov	9e31bec4fd	context : fix pos_min initialization upon error decode (#14008 ) ggml-ci b5592	2025-06-05 09:06:29 +03:00
Jeff Bolz	5a8ae3053c	vulkan: automatically deduce size of push constants (#13936 ) b5591	2025-06-05 07:17:58 +02:00
Ervin Áron Tasnádi	0d3984424f	ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813 ) * * ggml-vulkan: adds op CONV_TRANSPOSE_1D * test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D * Missing barrier added to shader. Number of additional tests reduced to 108. * * Fixes typo in variable name. * Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings. * Problem size reduced in tests to pass tests with llvmpipe. * supports_op condition moved from unintended position b5590	2025-06-04 22:02:00 +02:00

1 2 3 4 5 ...

5739 Commits