Commit Graph

  • 3e63a58ef7 kv-cache : refactor the update/defrag mechanism (#13988) b5589 Georgi Gerganov 2025-06-04 18:58:20 +03:00
  • 2589ad3704 ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) b5588 Diego Devesa 2025-06-04 06:37:40 -07:00
  • 482548716f releases : use dl backend for linux release, remove arm64 linux release (#13996) b5587 Diego Devesa 2025-06-04 04:15:54 -07:00
  • 3ac67535c8 llama-graph : use ggml_repeat_4d (#13998) b5586 Xuan-Son Nguyen 2025-06-04 10:11:26 +02:00
  • 0b4be4c435 CUDA: fix FTZ in FA for Gemma 3 (#13991) b5585 Johannes Gäßler 2025-06-04 08:57:05 +02:00
  • e0e806f52e kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985) b5584 Georgi Gerganov 2025-06-04 09:50:32 +03:00
  • 7e00e60ef8 vulkan: fix warnings in perf logger querypool code (#13937) Jeff Bolz 2025-06-03 13:30:22 -05:00
  • ea1431b0fa docs : add "Quick start" section for new users (#13862) Xuan-Son Nguyen 2025-06-03 13:09:36 +02:00
  • 71e74a3ac9 opencl: add backend_synchronize (#13939) b5581 lhez 2025-06-02 16:54:58 -07:00
  • bfb1e012a0 OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840) b5580 rmatif 2025-06-02 23:53:36 +00:00
  • 3637576288 server : disable speculative decoding for SWA models (#13970) b5579 Georgi Gerganov 2025-06-02 21:34:40 +03:00
  • ea394d7ab1 metal : use F32 accumulators in FA kernels (#13975) b5578 Georgi Gerganov 2025-06-02 21:33:40 +03:00
  • 5582c49c39 gemma : more consistent attention scaling for v2 and v3 (#13951) b5577 Georgi Gerganov 2025-06-02 20:54:26 +03:00
  • c9bbc77931 server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933) b5576 Olivier Chafik 2025-06-02 10:15:44 -07:00
  • bfd322796c mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) b5575 Xuan-Son Nguyen 2025-06-02 16:29:28 +02:00
  • 093e3f1feb cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966) b5574 shalinib-ibm 2025-06-02 17:48:36 +05:30
  • 663445b0de sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826) b5573 Atharva Dubey 2025-06-02 10:12:20 +01:00
  • 3862d954bb rope cisc/jina-embeddings-v3 Sigbjørn Skjæret 2025-06-01 21:46:15 +02:00
  • f5d0305d51 merge tensor loading into general bert Sigbjørn Skjæret 2025-06-01 20:55:03 +02:00
  • 404670134e Merge branch 'master' into cisc/jina-embeddings-v3 Sigbjørn Skjæret 2025-06-01 19:34:17 +02:00
  • 7675c555a1 gguf: fix failure on version == 0 (#13956) b5572 Johannes Gäßler 2025-06-01 18:08:05 +02:00
  • 5e1c3aed40 convert : fix nomic-bert-moe mask token (#13757) b5571 Sigbjørn Skjæret 2025-06-01 18:07:21 +02:00
  • c496fe0b1d convert : fix vocab padding code for bert models (#13954) Sigbjørn Skjæret 2025-06-01 17:23:11 +02:00
  • e57bb87ced ggml: check if non-native endian model is being loaded (#13943) b5569 Aaron Teo 2025-06-01 22:53:57 +08:00
  • f3a4b1659c sync : ggml b5568 Georgi Gerganov 2025-06-01 12:23:14 +03:00
  • 108009f5c7 vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
  • d337252acf cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
  • af6f91db47 ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00
  • a7b8d35f78 sync : whisper.cpp (ggml/1250) Georgi Gerganov 2025-05-29 13:29:50 +03:00
  • 6eba72b71c ggml : install dynamic backends (ggml/1240) Radoslav Gerganov 2025-05-29 08:34:46 +03:00
  • fedf034a98 ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) Daniel Tang 2025-05-27 20:58:46 -04:00
  • 8726392d3d readme : update bindings (#13950) ddh0 2025-06-01 03:44:30 -05:00
  • c04621711a parallel : fix n_junk == 0 (#13952) b5560 Georgi Gerganov 2025-06-01 11:42:16 +03:00
  • 0fc16b42e8 kv-cache : split implementation in separate sources (#13920) b5559 Georgi Gerganov 2025-06-01 11:39:27 +03:00
  • 053b1539c0 threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) b5558 Max Krasnyansky 2025-05-31 15:39:19 -07:00
  • ac35e50c16 Update tools/llama-bench/llama-bench.cpp maxk/sched-prio-updates Max Krasnyansky 2025-05-31 15:38:37 -07:00
  • d3a2eb592d disable on windows cisc/test-tokenizers-remote Sigbjørn Skjæret 2025-05-31 23:17:18 +02:00
  • 7210ebe230 revert build changes Sigbjørn Skjæret 2025-05-31 23:16:56 +02:00
  • 05f94a0e90 add arch to matrix Sigbjørn Skjæret 2025-05-31 22:54:37 +02:00
  • f9a27178e5 download in batches Sigbjørn Skjæret 2025-05-31 22:35:26 +02:00
  • 3129639449 kv-cache : avoid modifying recurrent cells when setting inputs Francis Couture-Harpin 2025-05-27 14:04:50 -04:00
  • de8ec1348b Merge branch 'master' into cisc/test-tokenizers-remote Sigbjørn Skjæret 2025-05-31 21:25:34 +02:00
  • 8e1125a8db copy curl dll for tests Sigbjørn Skjæret 2025-05-31 21:22:37 +02:00
  • b3a89c3d9e docs : Note about necessity of having libcurl installed for standard build. (#13945) Jiří Podivín 2025-05-31 18:58:35 +02:00
  • e15898d1c7 server: allow unclosed thinking tags (#13931) b5556 Olivier Chafik 2025-05-31 08:26:10 -07:00
  • 803f8baf4f llama : deprecate explicit kv_self defrag/update calls (#13921) b5555 Georgi Gerganov 2025-05-31 15:58:33 +03:00
  • 3600cc2886 llama : use n_swa + n_ubatch cells for SWA cache (#13833) b5554 Georgi Gerganov 2025-05-31 15:57:44 +03:00
  • c7e0a2054b webui : Replace alert and confirm with custom modals. (#13711) igardev 2025-05-31 12:56:08 +03:00
  • 3f55f781f1 llama : auto-batch preparation (#13845) b5552 Georgi Gerganov 2025-05-31 12:55:57 +03:00
  • 51fa76f172 mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917) b5551 Xuan-Son Nguyen 2025-05-31 10:14:29 +02:00
  • 12d0188c0d kv-cache : refactor + add llama_memory_state_i (#13746) Georgi Gerganov 2025-05-31 10:24:04 +03:00
  • eb3949938e CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (#13895) Shawn yang 2025-05-31 14:48:04 +08:00
  • 9087dd2664 threading: disable SetThreadInfo() calls for older Windows versions Max Krasnyansky 2025-04-17 14:13:29 -07:00
  • 199a838422 threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling Max Krasnyansky 2025-04-17 14:13:29 -07:00
  • e562eece7c CUDA: fix typo in FlashAttention code (#13926) b5548 Johannes Gäßler 2025-05-30 21:22:03 +02:00
  • b47ab7b8e9 sched : avoid changing cur_copy when a graph is already allocated (#13922) b5547 Diego Devesa 2025-05-30 09:56:19 -07:00
  • dd665cc9d4 parallel : increase the variability of the prompt lengths (#13927) b5546 Georgi Gerganov 2025-05-30 19:38:07 +03:00
  • df0c0c7d02 cuda : prevent using split buffers with 3d/4d matrices (#13919) b5545 Diego Devesa 2025-05-30 07:37:18 -07:00
  • b49a8ff96b SYCL: Add mrope kernel (#13755) b5544 Akarshan Biswas 2025-05-30 19:40:57 +05:30
  • 53f925074d sync : vendor (#13901) b5543 Georgi Gerganov 2025-05-30 16:25:45 +03:00
  • db38704f01 convert : fix rwkv bos/eos token (#13844) Sigbjørn Skjæret 2025-05-30 14:50:43 +02:00
  • 07e4351ce6 convert : allow partial update to the chkhsh pre-tokenizer list (#13847) b5541 Xuan-Son Nguyen 2025-05-30 12:24:37 +02:00
  • 291f2b6913 llama : add support for DistilBert (#13907) b5540 Đinh Trọng Huy 2025-05-30 18:56:02 +09:00
  • 4b4843adf3 windows builds adds build type to runtime output Sigbjørn Skjæret 2025-05-30 11:51:46 +02:00
  • 2c90da4c7e llama : use llm_build_granite for minicpm (#13911) b5539 zhangkaihuo 2025-05-30 16:31:48 +08:00
  • ec9e0301fe cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890) b5538 Christian Kastner 2025-05-30 01:28:54 +02:00
  • e83ba3e460 llama : add support for jina-reranker-v2 (#13900) b5537 Sigbjørn Skjæret 2025-05-29 21:42:31 +02:00
  • 2b131621e6 gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method (#13561) gguf-v0.17.0 Sigbjørn Skjæret 2025-05-29 15:36:05 +02:00
  • 54a2c7a8cd arm64: optimize q4_k_q8_k kernel with i8mm (#13886) b5535 Yibo Cai 2025-05-29 19:39:20 +08:00
  • 21fcc21ad5 cmake: Factor out CPU architecture detection (#13883) b5534 Christian Kastner 2025-05-29 12:50:25 +02:00
  • dd8ba93416 ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (#13882) b5533 Vineel Abhinav 2025-05-29 14:48:43 +05:30
  • 66c92061f5 tests : remove json.hpp from a test (#13880) b5532 Georgi Gerganov 2025-05-29 12:17:16 +03:00
  • 5ca82fc1d7 convert : workaround for AutoConfig dummy labels (#13881) Sigbjørn Skjæret 2025-05-29 10:00:57 +02:00
  • 6385b843a8 llama : add RobertaForSequenceClassification reranker support (#13875) b5530 Sigbjørn Skjæret 2025-05-29 08:15:01 +02:00
  • 1b8fb8152d ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843) b5529 Vineel Abhinav 2025-05-29 11:31:33 +05:30
  • 53ae30640e gguf-py : fix SafetensorRemote return on undefined size (< 0) (#13841) Beinsezii 2025-05-28 14:50:20 -07:00
  • 763d06edb7 llama : fix KV shift for qwen2vl (#13870) b5527 Xuan-Son Nguyen 2025-05-28 22:35:31 +02:00
  • 10961339b2 mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866) b5526 Xuan-Son Nguyen 2025-05-28 22:35:22 +02:00
  • d98f2a35fc ci: disable LLAMA_CURL for Linux cross-builds (#13871) bandoti 2025-05-28 15:46:47 -03:00
  • e0e3aa231d llama : add support for BertForSequenceClassification reranker (#13858) b5524 Đinh Trọng Huy 2025-05-29 02:01:58 +09:00
  • aa6dff05be convert: small addition to support LlamaModel (#13838) Đinh Trọng Huy 2025-05-28 23:34:18 +09:00
  • c962ae3382 server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode (#13853) b5522 Sky 2025-05-28 22:33:54 +08:00
  • a3938fb53d convert : fix qwen omni conversion (#13859) Xuan-Son Nguyen 2025-05-28 16:12:35 +02:00
  • f7873fc698 tests : change umlaut test (#11600) Alex Fanthome 2025-05-28 14:49:28 +01:00
  • a68247439b CUDA: fix FA tg at long context for CC >= 8.9 (#13852) b5519 Johannes Gäßler 2025-05-28 13:33:37 +02:00
  • d97b9ade51 correct working directory for all builds Sigbjørn Skjæret 2025-05-28 12:49:36 +02:00
  • 0fe7183ae4 fix prototype for non-curl builds Sigbjørn Skjæret 2025-05-28 11:11:02 +02:00
  • ecbc92acd0 correct working directory Sigbjørn Skjæret 2025-05-28 10:16:34 +02:00
  • 26b79b6cb3 convert : fix tensor naming conflict for llama 4 vision (#13836) Xuan-Son Nguyen 2025-05-28 10:05:54 +02:00
  • 42ff1867bc add test-tokenizers-remote Sigbjørn Skjæret 2025-05-28 09:51:44 +02:00
  • 2d2e059f4f make common_download_file_single/multiple public Sigbjørn Skjæret 2025-05-28 09:50:41 +02:00
  • 1e8659e65a CANN: Add SOC TYPE printing in cmake configuration (#13837) b5517 leo-pony 2025-05-28 11:54:20 +08:00
  • a3c30846e4 opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (#13787) b5516 lhez 2025-05-27 12:56:08 -07:00
  • 1701d4c54f opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (#13790) b5515 lhez 2025-05-27 12:53:14 -07:00
  • bef8176387 vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817) b5514 Jeff Bolz 2025-05-27 11:39:07 -05:00
  • 34b7c0439e cmake : add llama-cparams.cpp to build (#13832) b5513 Georgi Gerganov 2025-05-27 19:08:44 +03:00
  • f3101a8cc6 SYCL: add gelu_erf kernel (#13749) b5512 Akarshan Biswas 2025-05-27 20:52:59 +05:30
  • 1c49c70d07 sync : ggml Georgi Gerganov 2025-05-27 18:04:38 +03:00
  • a8ea03d8ad ggml : add ggml_repeat_4d (#13824) b5510 Xuan-Son Nguyen 2025-05-27 15:53:55 +02:00
  • 05f6ac6283 ggml : riscv: add xtheadvector support (#13720) b5509 xctan 2025-05-27 21:21:36 +08:00