Commit Graph

  • 5fce5f948d kv-cache : fix use-after-move of defrag info (#14189) b5669 Georgi Gerganov 2025-06-15 10:52:11 +03:00
  • 9ae4143bc6 model : add dots.llm1 architecture support (#14044) (#14118) b5668 Mikko Juola 2025-06-15 00:52:06 -07:00
  • c311ac664d cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) b5667 Georgi Gerganov 2025-06-15 10:08:58 +03:00
  • b9912ac570 batch : auto-gen positions + verify multi-sequence input (#14177) b5666 Georgi Gerganov 2025-06-15 09:18:37 +03:00
  • 00ba772610 docs : remove WIP since PR has been merged (#13912) Pepijn de Vos 2025-06-15 08:06:37 +02:00
  • 3cb203c89f llama-chat : Do not throw when tool parsing fails (#14012) b5664 Piotr 2025-06-14 18:25:15 +02:00
  • 2e42be42bd compare-llama-bench: add option to plot (#14169) Aman Gupta 2025-06-14 16:34:20 +08:00
  • dfa3c18266 tests : add LLAMA, LLAMA4, and GEMMA2 to test-model-random Francis Couture-Harpin 2025-06-13 20:02:29 -04:00
  • 61f6429470 Merge branch 'master' into compilade/test-model-random Francis Couture-Harpin 2025-06-13 14:31:39 -04:00
  • fb85a288d7 vocab : fix build (#14175) b5662 Georgi Gerganov 2025-06-13 20:03:05 +03:00
  • 40643edb86 sycl: fix docker image (#14144) Svetlozar Georgiev 2025-06-13 17:32:56 +01:00
  • 3cfbbdb44e Merge commit from fork Guy Goldenberg 2025-06-13 19:20:25 +03:00
  • 80709b70a2 batch : add LLAMA_BATCH_DEBUG environment variable (#14172) b5659 Georgi Gerganov 2025-06-13 18:35:00 +03:00
  • 26ff3685bf docs : Update multimodal.md (#14122) ddpasa 2025-06-13 15:17:53 +02:00
  • 60c666347b batch : rework llama_batch_allocr (#14153) b5657 Georgi Gerganov 2025-06-13 13:47:55 +03:00
  • b7cc7745e3 readme : remove survey link (#14168) Georgi Gerganov 2025-06-13 11:55:44 +03:00
  • cc8d081879 cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) b5655 Christian Kastner 2025-06-13 08:38:52 +00:00
  • d714dadb57 pooling : make cls_b and cls_out_b optional (#14165) b5654 Đinh Trọng Huy 2025-06-13 17:34:08 +09:00
  • ffad043973 server : fix SWA condition for full context reprocess (#14163) b5653 Georgi Gerganov 2025-06-13 11:18:25 +03:00
  • 0889eba570 sycl: Adding additional cpy dbg print output (#14034) b5652 Anton Mitkov 2025-06-13 08:51:39 +01:00
  • c61285e739 SYCL: Bump oneMath commit (#14152) b5651 Ewan Crawford 2025-06-13 08:45:37 +01:00
  • 09cf2c7c65 cmake : Improve build-info.cpp generation (#14156) b5650 Christian Kastner 2025-06-13 06:51:34 +00:00
  • c33fe8b8c4 vocab : prevent heap overflow when vocab is too small (#14145) b5649 Georgi Gerganov 2025-06-13 08:03:54 +03:00
  • ed52f3668e sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125) b5648 Anton Mitkov 2025-06-12 14:15:11 +01:00
  • a681b4ba83 readme : remove project status link (#14149) Georgi Gerganov 2025-06-12 14:43:09 +03:00
  • 7d516443dd server : re-enable SWA speculative decoding (#14131) b5646 Georgi Gerganov 2025-06-12 11:51:38 +03:00
  • 36fce98281 server : re-enable swa speculative decoding gg/server-reenable-swa-spec Georgi Gerganov 2025-06-11 20:22:06 +03:00
  • f6e1a7aa87 context : simplify output counting logic during decode (#14142) b5645 Georgi Gerganov 2025-06-12 11:50:01 +03:00
  • c3ee46fab4 batch : remove logits_all flag (#14141) b5644 Georgi Gerganov 2025-06-12 11:49:26 +03:00
  • ed99a8ea04 cont : fix comments gg/batch-simplify-output Georgi Gerganov 2025-06-12 10:43:55 +03:00
  • b8b8d3f368 context : simplify output counting logic during decode Georgi Gerganov 2025-06-12 10:35:09 +03:00
  • e2c0b6e46a cmake : handle whitepsaces in path during metal build (#14126) Georgi Gerganov 2025-06-12 10:14:24 +03:00
  • c53acda0b8 batch : remove logits_all flag Georgi Gerganov 2025-06-12 10:10:45 +03:00
  • 9596506965 kv-cache : fix split_equal handling in unified implementation (#14130) b5642 Georgi Gerganov 2025-06-12 10:02:15 +03:00
  • a20b2b05bc context : round n_tokens to next multiple of n_seqs when reserving (#14140) b5641 compilade 2025-06-12 02:56:04 -04:00
  • 8fe213af76 tests : avoid sprintf in test-model-random Francis Couture-Harpin 2025-06-12 02:48:11 -04:00
  • 7657835b33 tests : fix overflow and memory leaks in test-model-random Francis Couture-Harpin 2025-06-12 02:23:44 -04:00
  • 9cd402cbe1 tests : add test-model-random Francis Couture-Harpin 2025-06-11 17:53:55 -04:00
  • 2e89f76b7a common: fix issue with regex_escape routine on windows (#14133) b5640 bandoti 2025-06-11 17:19:44 -03:00
  • 4b6fb6524b context : round n_tokens to next multiple of n_seqs when reserving compilade/fix-batch-reserve-rwkv Francis Couture-Harpin 2025-06-11 16:17:36 -04:00
  • 0b6f6becb4 ggml-cpu : reorder SVE FMA for consistency with other SIMD arches Francis Couture-Harpin 2025-06-11 15:29:58 -04:00
  • 757aa6239d ggml : fix mamba2 ssm scan when compiled with SVE Francis Couture-Harpin 2025-06-11 12:33:05 -04:00
  • 532802f938 Implement GGML_CPU_ALL_VARIANTS for ARM (#14080) b5639 Christian Kastner 2025-06-11 19:07:44 +00:00
  • d4e0d95cf5 chore : clean up relative source dir paths (#14128) b5638 Sigbjørn Skjæret 2025-06-11 19:04:23 +02:00
  • 2fa5f2ceb8 graph : fix recurrent state copies when avoiding copies Francis Couture-Harpin 2025-06-10 20:00:41 -04:00
  • cc66a7f78f tests : add test-tokenizers-repo (#14017) b5637 Sigbjørn Skjæret 2025-06-11 17:16:32 +02:00
  • bd248d4dc7 vulkan: Better thread-safety for command pools/buffers (#14116) b5636 Jeff Bolz 2025-06-11 09:48:52 -05:00
  • 7781e5fe99 webui: Wrap long numbers instead of infinite horizontal scroll (#14062) Aman 2025-06-11 22:42:25 +08:00
  • 89a184fa71 kv-cache : relax SWA masking condition (#14119) b5634 Georgi Gerganov 2025-06-11 16:48:45 +03:00
  • 2baf07727f server : pass default --keep argument (#14120) b5633 Taylor 2025-06-11 06:43:43 -04:00
  • 7ae2932116 kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121) b5632 Georgi Gerganov 2025-06-11 12:52:45 +03:00
  • 1f7d50b293 vulkan: Track descriptor pools/sets per-context (#14109) b5631 Jeff Bolz 2025-06-11 00:19:25 -05:00
  • 4c763c8d1b opencl: add mul_mv_id_q4_0_f32_8x_flat (#14003) b5630 lhez 2025-06-10 16:55:58 -07:00
  • 9864bfcd01 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-06-10 19:22:15 -04:00
  • dad5c44398 kv-cache : avoid modifying recurrent cells when setting inputs (#13834) b5629 compilade 2025-06-10 18:20:14 -04:00
  • 55f6b9fa65 convert : fix duplicate key DeepSeek-R1 conversion error (#14103) Sigbjørn Skjæret 2025-06-10 23:29:52 +02:00
  • 3678b838bb llama : support GEGLU for jina-bert-v2 (#14090) b5627 Sigbjørn Skjæret 2025-06-10 18:02:08 +02:00
  • 652b70e667 vulkan: force device 0 in CI (#14106) Jeff Bolz 2025-06-10 10:53:47 -05:00
  • 3a12db23b6 Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104) b5625 Juk Armstrong 2025-06-10 16:48:07 +01:00
  • ae92c1855b sync : ggml b5624 Georgi Gerganov 2025-06-10 17:37:45 +03:00
  • b7ce1ad1e3 ggml : fix weak alias win32 (whisper/0) Georgi Gerganov 2025-06-10 11:34:10 +03:00
  • 97340b4c99 Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099) b5622 0cc4m 2025-06-10 14:01:33 +02:00
  • 2bb0467043 rpc : nicer error messages for RPC server crash (#14076) b5621 Isaac McFadyen 2025-06-10 02:41:01 -04:00
  • b8e2194efc sync : ggml b5620 Georgi Gerganov 2025-06-10 09:20:51 +03:00
  • 1a3b5e80f7 Add in-build ggml::ggml ALIAS library (ggml/1260) Kai Pastor 2025-06-03 12:33:28 +02:00
  • 62a9f34bae llama-graph : fix recurrent state copy compilade/readonly-recurrent-inputs Francis Couture-Harpin 2025-06-10 00:19:13 -04:00
  • dd6495ddc9 Merge branch 'master' into compilade/readonly-recurrent-inputs Francis Couture-Harpin 2025-06-09 16:35:36 -04:00
  • 1f63e75f3b metal : use less stack memory in FA kernel (#14088) b5618 Georgi Gerganov 2025-06-09 23:05:02 +03:00
  • 40cbf571c9 kv-cache : fix shift and defrag logic (#14081) b5617 Georgi Gerganov 2025-06-09 23:04:35 +03:00
  • 7f4fbe5183 llama : allow building all tests on windows when not using shared libs (#13980) b5616 Diego Devesa 2025-06-09 11:03:09 -07:00
  • c257a8871c cont : fix defrag erasing cells that didn't move gg/kv-fix-shift Georgi Gerganov 2025-06-09 20:45:56 +03:00
  • d564e04ce8 cont : reset shift[i] Georgi Gerganov 2025-06-09 19:24:25 +03:00
  • f470bc36be ggml-cpu : split arch-specific implementations (#13892) b5615 xctan 2025-06-09 22:47:13 +08:00
  • 8f47e25f56 cuda : fix device sync on buffer clear (#14033) b5614 Diego Devesa 2025-06-09 07:36:26 -07:00
  • 201b31dc2e graph : fix geglu (#14077) b5613 Georgi Gerganov 2025-06-09 17:17:31 +03:00
  • e21d2d4ae2 CANN: Simplify the environment variable setting(#13104) b5612 Xinpeng Dou 2025-06-09 19:47:39 +08:00
  • dc0623fddb webui: fix sidebar being covered by main content (#14082) R0CKSTAR 2025-06-09 18:01:17 +08:00
  • 87d34b381d server : fix LRU check (#14079) b5610 Georgi Gerganov 2025-06-09 12:57:58 +03:00
  • b460d16ae8 sycl: Add reorder to Q6_K mmvq implementation (#13885) b5609 Nicolò Scipione 2025-06-09 11:47:07 +02:00
  • eee8d481d9 kv-cache : fix shift Georgi Gerganov 2025-06-09 10:53:26 +03:00
  • 91a8ee6a6f add geglu activation function (#14074) b5608 Đinh Trọng Huy 2025-06-09 13:15:31 +09:00
  • 056eb74534 CANN: Enable labeler for Ascend NPU (#13914) Yuanhao Ji 2025-06-09 11:20:06 +08:00
  • 247e5c6e44 cuda : fix buffer type check with integrated GPUs (#14069) b5606 Diego Devesa 2025-06-08 11:39:56 -07:00
  • 5787b5da57 ci: add LoongArch cross-compile build (#13944) 吴小白 2025-06-07 21:39:11 +08:00
  • 228f34c9ce SYCL: Implement few same quantized type copy kernels (#13739) b5604 Akarshan Biswas 2025-06-07 18:58:20 +05:30
  • 0974ad7a7c llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) b5603 Sigbjørn Skjæret 2025-06-07 14:13:12 +02:00
  • 745aa5319b llama : deprecate llama_kv_self_ API (#14030) b5602 Georgi Gerganov 2025-06-06 14:11:15 +03:00
  • 487a5e0401 context : fix SWA-related warning for multiple sequences (#14045) b5601 Georgi Gerganov 2025-06-06 13:29:18 +03:00
  • d17a809ef0 llama : support multiple classifier outputs and labels (#13940) b5600 Sigbjørn Skjæret 2025-06-06 09:03:25 +02:00
  • ca407742c5 profiler: initial support for profiling graph ops graph-profiler Max Krasnyansky 2025-06-05 14:38:13 -07:00
  • 1caae7fc6c gguf-py : add add_classifier_output_labels method to writer (#14031) Sigbjørn Skjæret 2025-06-05 17:42:31 +02:00
  • 669c13e0f6 vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) b5598 Masato Nakasaka 2025-06-05 23:00:29 +09:00
  • 146b88e8b3 ci: fix CUDA build failure on autodl cloud machines (#14005) pockers21 2025-06-05 06:25:29 -07:00
  • 7f37b6cf1e memory : migrate from llama_kv_cache to more generic llama_memory (#14006) b5596 Georgi Gerganov 2025-06-05 15:29:22 +03:00
  • 3a077146a4 llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) b5595 Diego Devesa 2025-06-05 02:57:42 -07:00
  • d01d112abb readme : add badge (#13938) Olexandr88 2025-06-05 10:50:55 +03:00
  • 9f47fa5792 vocab : warn about missing mask token (#14022) b5593 Sigbjørn Skjæret 2025-06-05 09:29:18 +02:00
  • 9e31bec4fd context : fix pos_min initialization upon error decode (#14008) b5592 Georgi Gerganov 2025-06-05 09:06:29 +03:00
  • 5a8ae3053c vulkan: automatically deduce size of push constants (#13936) b5591 Jeff Bolz 2025-06-05 00:17:58 -05:00
  • 0d3984424f ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813) b5590 Ervin Áron Tasnádi 2025-06-04 22:02:00 +02:00