Commit Graph

  • 8846aace49 model : gemma3n text-only (#14400) master Xuan-Son Nguyen 2025-06-26 19:34:02 +02:00
  • a01047b041 cmake: regen vulkan shaders when shaders-gen sources change (#14398) bandoti 2025-06-26 13:46:53 -03:00
  • 8b5ea7ad67 SYCL: Take improvements from GLU branch and disable faulty fp16 exp after update sycl/fix_exp Akarshan 2025-06-26 19:48:24 +05:30
  • b25346221d llama : return mistral-v7-tekken as default template only (#14390) Sigbjørn Skjæret 2025-06-26 15:01:14 +02:00
  • e8215dbb96 metal : add special-case mat-vec mul for ne00 == 4 (#14385) b5760 Georgi Gerganov 2025-06-26 15:51:19 +03:00
  • 5783ae4359 metal : batch rows copy in a single threadgroup (#14384) b5759 Georgi Gerganov 2025-06-26 15:50:15 +03:00
  • bf5bcd0b85 docs: update s390x documentation + add faq (#14389) Aaron Teo 2025-06-26 18:41:41 +08:00
  • 716301d1b0 musa: enable fp16 mma (all) and cublas on qy2 (#13842) b5757 R0CKSTAR 2025-06-26 12:11:59 +08:00
  • 60ef23d6c1 ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) b5756 Aaron Teo 2025-06-26 05:49:04 +08:00
  • b193d53069 ggml : do not output unprintable characters on GGUF load failure (#14381) b5755 Sigbjørn Skjæret 2025-06-25 23:26:51 +02:00
  • 2bf9d539dd sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973) b5754 Anton Mitkov 2025-06-25 17:09:55 +01:00
  • 6179578988 batch : require non-coupled batch with sequential split_equal gg/llama-high-throughput Georgi Gerganov 2025-06-25 17:20:46 +03:00
  • 5eb1a88dc0 batch : optional requirement for sequential sequence ids Georgi Gerganov 2025-06-25 17:02:38 +03:00
  • 6663128448 kv-cache : rework kv_idxs, support seq_cp Georgi Gerganov 2025-06-25 14:48:47 +03:00
  • 0bb1da5854 kv-cache : simplify set_rows logic Georgi Gerganov 2025-06-24 23:14:24 +03:00
  • 73e53dc834 opencl: ref count ggml_backend_opencl_context and refactor profiling (#14254) b5753 lhez 2025-06-24 11:46:25 -07:00
  • 165d822044 graph : support iSWA virtual sequences Georgi Gerganov 2025-06-24 20:35:16 +03:00
  • 1b74b9d73b ggml : extend support for n_seq for soft_max and fattn Georgi Gerganov 2025-06-24 20:14:22 +03:00
  • 8c68219835 kv-cache : fix non-FA path with virutal sequences Georgi Gerganov 2025-06-24 20:01:05 +03:00
  • 7c6487b22f metal : extend ggml_soft_max_ext() to support n_seq dim Georgi Gerganov 2025-06-24 20:00:40 +03:00
  • 62af464227 batch : fix check for empty sequences in memory (#14364) b5752 Georgi Gerganov 2025-06-24 18:26:30 +03:00
  • c148cf1946 cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#14362) b5751 Mathieu Baudier 2025-06-24 15:05:31 +02:00
  • 401c13e3c3 cont : fix build Georgi Gerganov 2025-06-24 15:59:47 +03:00
  • 132143938f tools : tmp adjustments (TMP) Georgi Gerganov 2025-06-24 15:02:58 +03:00
  • 52b9007176 llama : add "virtual sequences" Georgi Gerganov 2025-06-23 16:29:02 +03:00
  • 2aac8e81b0 add tests cisc/assistant-prefilling-content-array Sigbjørn Skjæret 2025-06-24 14:02:34 +02:00
  • 1500690046 text is stored in content_parts Sigbjørn Skjæret 2025-06-24 12:21:04 +02:00
  • e8f8c2c711 fix assistant prefilling when content is an array Sigbjørn Skjæret 2025-06-24 12:01:37 +02:00
  • 1b809cee22 server : move no API key doc to /health (#14352) Nigel Bosch 2025-06-24 08:59:11 +00:00
  • abf241045d main : honor --verbose-prompt on interactive prompts (#14350) b5749 Sigbjørn Skjæret 2025-06-24 09:31:00 +02:00
  • 901e20bbe5 jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349) Bartowski 2025-06-24 02:17:58 -04:00
  • 0142961a2e CUDA/HIP: optimize mmv paths taken for HIP devices (#14324) b5747 uvos 2025-06-24 01:12:56 +02:00
  • e33de128c7 common : move string_remove_suffix from quantize and imatrix compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-06-23 16:22:27 -04:00
  • ce82bd0117 ci: add workflow for relocatable cmake package (#14346) bandoti 2025-06-23 15:30:51 -03:00
  • 118d52fefc Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-06-23 12:54:56 -04:00
  • 0e79355075 quantize : fix dataset name loading from gguf imatrix Francis Couture-Harpin 2025-06-23 12:43:25 -04:00
  • 43cd2b3eb5 imatrix : support 3d tensors with MUL_MAT Francis Couture-Harpin 2025-06-23 11:50:54 -04:00
  • afdb669206 Merge branch 'master' into compilade/mamba2 compilade/mamba2 Francis Couture-Harpin 2025-06-23 10:40:16 -04:00
  • bf2a99e3cb vulkan: update windows SDK in release.yml (#14344) b5745 Jeff Bolz 2025-06-23 08:44:48 -05:00
  • 72c6bc3f3d llama : better rwkv chat template and add missing inputs.use_jinja setting (#14336) b5744 Molly Sophia 2025-06-23 19:56:19 +08:00
  • defe2158dd CUDA: mul_mat_v support for batch sizes > 1 (#14262) b5743 Johannes Gäßler 2025-06-23 13:11:31 +02:00
  • 36f8e20d08 kv-cache : utilize ggml_set_rows broadcast gg/kv-cache-use-set-rows Georgi Gerganov 2025-06-22 10:28:22 +03:00
  • 332f073589 cont : support non-continuous slots Georgi Gerganov 2025-06-21 16:23:31 +03:00
  • 39d0b1e8df cont : kv-cells cp/set for non-cont slots Georgi Gerganov 2025-06-21 15:26:01 +03:00
  • f875d6cb72 cont : migrate to using set of indices instead of slot head Georgi Gerganov 2025-06-21 11:57:07 +03:00
  • db2bb378b1 cont : gate the ggml_set_rows usage with env var Georgi Gerganov 2025-06-21 10:37:06 +03:00
  • 79dac3c861 kv-cache : use ggml_set_rows Georgi Gerganov 2025-06-19 19:26:47 +03:00
  • 1f647b5992 ggml : fix supports_op Radoslav Gerganov 2025-06-23 11:25:16 +03:00
  • eba97574da ggml : simplify forward_dup_f32 Radoslav Gerganov 2025-06-23 11:16:54 +03:00
  • c0cfc2f78b metal : add ggml_set_rows implementation Georgi Gerganov 2025-06-22 18:45:52 +03:00
  • 828e5d2fcd tests : add ggml_set_rows Georgi Gerganov 2025-06-22 18:45:30 +03:00
  • e73690a69d ggml : ggml_set_rows update comment + better index name Georgi Gerganov 2025-06-22 18:45:07 +03:00
  • e89709721b ggml : support GGML_TYPE_F32 ".from_float" trait Georgi Gerganov 2025-06-22 18:44:42 +03:00
  • 630c84a2bd ggml : ggml_set_rows support quantized dst Georgi Gerganov 2025-06-22 11:10:42 +03:00
  • df71c803b4 ggml : ggml_set_rows support broadcast Georgi Gerganov 2025-06-22 10:28:07 +03:00
  • 313a444b22 ggml : add ggml_is_contiguous_rows Georgi Gerganov 2025-06-22 10:27:31 +03:00
  • 695b6b7025 ggml : add repeat impl for i64 Georgi Gerganov 2025-06-21 09:07:25 +03:00
  • f2cd962fe2 use I64 for indices Radoslav Gerganov 2025-06-20 11:37:43 +03:00
  • c1a581a10b ggml : add ggml_set_rows Radoslav Gerganov 2025-06-19 11:04:23 +03:00
  • 7b50d589a8 kv-cells : fix tracking of seq_pos (#14339) b5742 Georgi Gerganov 2025-06-23 12:27:35 +03:00
  • 3a9457df96 vulkan: update windows SDK in CI (#14334) Jeff Bolz 2025-06-23 03:19:24 -05:00
  • fa4a9f2a1c quantize : handle user-defined pruning of whole layers (blocks) (#13037) b5740 Ed Addario 2025-06-22 22:16:26 +01:00
  • 238005c2dc gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) Sigbjørn Skjæret 2025-06-22 19:46:17 +02:00
  • 66aba7aca9 run : avoid double tokenization (#14327) b5738 Ruikai Peng 2025-06-23 01:28:06 +08:00
  • f1f5e82df6 examples : fix is_first logic for tokenization (#14329) b5737 Georgi Gerganov 2025-06-22 20:10:07 +03:00
  • af3373f1ad HIP: enable vec fattn on RDNA4 (#14323) b5736 uvos 2025-06-22 16:51:23 +02:00
  • ab46d11de5 Refactor: Optimize SYCL element-wise operations with unary function inlining cisc/unary-reglu-geglu-swiglu Akarshan 2025-06-22 19:21:19 +05:30
  • 5d5c066de8 mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326) b5735 yuiseki 2025-06-22 21:44:57 +09:00
  • 40bfa04c95 common : use std::string_view now that we target c++17 (#14319) b5734 Sigbjørn Skjæret 2025-06-22 07:37:43 +02:00
  • a234e09f41 GGML: increase OP count in assertion Akarshan 2025-06-22 10:36:09 +05:30
  • 35dacd1a93 ggml : implement GLU for split up/gate (#14181) Sigbjørn Skjæret 2025-06-18 16:11:07 +02:00
  • a9aedf46b4 SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate Akarshan 2025-06-14 18:34:21 +05:30
  • 34d1aedafb Vulkan: Add GLU ops and shaders 0cc4m 2025-06-14 10:06:55 +00:00
  • d5934297ef update comment [no ci] Sigbjørn Skjæret 2025-06-13 23:08:18 +02:00
  • 0b2703fc57 implement swapped variants (cpu/cuda) Sigbjørn Skjæret 2025-06-13 22:48:53 +02:00
  • f8705a2399 64bit multiplication [no ci] Sigbjørn Skjæret 2025-06-13 17:11:01 +02:00
  • 70e8b48e6a more constraints and use 64bit ints Sigbjørn Skjæret 2025-06-13 16:34:23 +02:00
  • cfa9c7a47a add CUDA_GLU_BLOCK_SIZE [no ci] Sigbjørn Skjæret 2025-06-13 16:10:03 +02:00
  • d9ddeb9dfd metal : add glu kernels Georgi Gerganov 2025-06-13 16:12:25 +03:00
  • a341aa3c2b refactor into GGML_GLU_OP Sigbjørn Skjæret 2025-06-13 10:14:32 +02:00
  • f8c20809de tighten constraints again Sigbjørn Skjæret 2025-06-13 09:00:30 +02:00
  • a1a7b6dfa9 implement unary REGLU/GEGLU/SWIGLU cuda ops Sigbjørn Skjæret 2025-06-13 01:11:57 +02:00
  • bb2fda70ae special case gated ops Sigbjørn Skjæret 2025-06-13 01:07:49 +02:00
  • 21c4963bd3 fix ggml_vec_geglu_f16 Sigbjørn Skjæret 2025-06-13 01:04:59 +02:00
  • 56c7993171 duplicate shape of source Sigbjørn Skjæret 2025-06-13 00:51:53 +02:00
  • 5a490f07a2 relax constraints Sigbjørn Skjæret 2025-06-12 23:05:51 +02:00
  • 76c9bc1731 implement unary REGLU/GEGLU/SWIGLU cpu ops Sigbjørn Skjæret 2025-06-12 17:39:56 +02:00
  • aa064b2eb7 CUDA: add mean operation (#14313) b5733 Aman Gupta 2025-06-22 12:39:54 +08:00
  • aa0ef5c578 gguf-py : fix Qwen3-Embedding eos token (#14314) Sigbjørn Skjæret 2025-06-21 18:12:05 +02:00
  • bb16041cae Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792) b5731 Markus Tavenrath 2025-06-21 08:17:12 +02:00
  • 58cba76a9a gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) Sigbjørn Skjæret 2025-06-21 07:33:21 +02:00
  • 67ae5312e2 metal : fix thread-safety (#14300) b5729 Georgi Gerganov 2025-06-21 08:04:18 +03:00
  • 692e3cdd0a memory : rename interface to llama_memory_context_i (#14296) b5728 Georgi Gerganov 2025-06-21 08:03:46 +03:00
  • b23fa0b3f4 convert : fix Llama 4 conversion (#14311) Daniel Han 2025-06-20 21:32:01 -07:00
  • 06cbedfca1 sync : ggml b5726 Georgi Gerganov 2025-06-20 20:50:24 +03:00
  • b7147673f2 Add ggml_roll (ggml/1274) Acly 2025-06-18 13:34:50 +02:00
  • d860dd99a4 docs : fix the link to llama.h (#14293) David Chiu 2025-06-21 01:43:35 +08:00
  • c959f462a0 CUDA: add conv_2d_transpose (#14287) b5723 Aman Gupta 2025-06-20 22:48:24 +08:00
  • 22015b2092 lint : remove trailing whitepace (#14304) b5722 Sigbjørn Skjæret 2025-06-20 16:37:44 +02:00
  • dd6e6d0b6a vocab : prevent tokenizer overflow (#14301) b5721 Ruikai Peng 2025-06-20 22:13:06 +08:00