llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-07-19 09:08:04 +00:00

Files

Reese Levine 21c021745d ggml: Add initial WebGPU backend (#14521 )

* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults

* Initialize webgpu device

* Making progress on setting up the backend

* Finish more boilerplate/utility functions

* Organize file and work on alloc buffer

* Add webgpu_context to prepare for actually running some shaders

* Work on memset and add shader loading

* Work on memset polyfill

* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it

* Implement get_tensor and buffer_clear

* Finish rest of setup

* Start work on compute graph

* Basic mat mul working

* Work on emscripten build

* Basic WebGPU backend instructions

* Use EMSCRIPTEN flag

* Work on passing ci, implement 4d tensor multiplication

* Pass thread safety test

* Implement permuting for mul_mat and cpy

* minor cleanups

* Address feedback

* Remove division by type size in cpy op

* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends

* Fix name

* Fix macos dawn prefix path

2025-07-16 18:18:51 +03:00

backend

sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973 )

2025-06-25 18:09:55 +02:00

development

model : add SmolLM3 (#14581 )

2025-07-08 18:07:01 +02:00

multimodal

mtmd : rename llava directory to mtmd (#13311 )

2025-05-05 16:02:55 +02:00

ops

Docs: script to auto-generate ggml operations docs (#14598 )