mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-09-10 09:17:49 -04:00

Files

Reese Levine 21c021745d ggml: Add initial WebGPU backend (#14521 )

* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults

* Initialize webgpu device

* Making progress on setting up the backend

* Finish more boilerplate/utility functions

* Organize file and work on alloc buffer

* Add webgpu_context to prepare for actually running some shaders

* Work on memset and add shader loading

* Work on memset polyfill

* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it

* Implement get_tensor and buffer_clear

* Finish rest of setup

* Start work on compute graph

* Basic mat mul working

* Work on emscripten build

* Basic WebGPU backend instructions

* Use EMSCRIPTEN flag

* Work on passing ci, implement 4d tensor multiplication

* Pass thread safety test

* Implement permuting for mul_mat and cpy

* minor cleanups

* Address feedback

* Remove division by type size in cpy op

* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends

* Fix name

* Fix macos dawn prefix path

2025-07-16 18:18:51 +03:00

README.md

musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 )

2025-05-21 09:58:49 +08:00

run.sh

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

README.md

CI

In addition to Github Actions llama.cpp uses a custom CI framework:

https://github.com/ggml-org/ci

It monitors the master branch for new commits and runs the ci/run.sh script on dedicated cloud instances. This allows us to execute heavier workloads compared to just using Github Actions. Also with time, the cloud instances will be scaled to cover various hardware architectures, including GPU and Apple Silicon instances.

Collaborators can optionally trigger the CI run by adding the ggml-ci keyword to their commit message. Only the branches of this repo are monitored for this keyword.

It is a good practice, before publishing changes to execute the full CI locally on your machine:

mkdir tmp

# CPU-only build
bash ./ci/run.sh ./tmp/results ./tmp/mnt

# with CUDA support
GG_BUILD_CUDA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt

# with SYCL support
source /opt/intel/oneapi/setvars.sh
GG_BUILD_SYCL=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt

# with MUSA support
GG_BUILD_MUSA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt

Running MUSA CI in a Docker Container

Assuming $PWD is the root of the llama.cpp repository, follow these steps to set up and run MUSA CI in a Docker container:

1. Create a local directory to store cached models, configuration files and venv:

mkdir -p $HOME/llama.cpp/ci-cache

2. Create a local directory to store CI run results:

mkdir -p $HOME/llama.cpp/ci-results

3. Start a Docker container and run the CI:

docker run --privileged -it \
    -v $HOME/llama.cpp/ci-cache:/ci-cache \
    -v $HOME/llama.cpp/ci-results:/ci-results \
    -v $PWD:/ws -w /ws \
    mthreads/musa:rc4.0.1-mudnn-devel-ubuntu22.04

Inside the container, execute the following commands:

apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
git config --global --add safe.directory /ws
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache

This setup ensures that the CI runs within an isolated Docker environment while maintaining cached files and results across runs.