* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
* [cl][adreno] Add Adreno GPU support
Add new OpenCL backend to support Adreno GPUs
---------
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
* [cl][ci] Add workflow for CL
* [cl][adreno] Fix memory leak for non SMALL_ALLOC path
* opencl: integrate backend dyn.load interface and fix compiler and format warnings
* opencl: remove small-alloc support and fix build errors for non-opencl platforms
* opencl: fixed merge conflict (MUSA added twice in cmake)
* opencl-ci: use RUNNER_TEMP instead of github.workspace
* opencl: fix embed tool invocation with python3
* opencl: CI workflow fixes
* opencl: Clean up small-alloc in CMake files
* opencl: cleanup ggml-opencl2 header file
* opencl: use ulong for offsets and strides in ADD kernel
* opencl: use cl_ulong for all offsets
* opencl: use cl_ulong for sizes and strides
* opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)`
* opencl: rename backend `opencl2` -> `opencl`
* opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl`
* opencl: make OpenCL required, remove redundant lib and inc directories
* `ggml-base`, `..` and `.` are added by `ggml_add_backend_library`
* opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl`
* opencl: remove copyright marker since main license already covers
* opencl: replace some more OPENCL2 leftovers
* opencl: remove limits on `tensor_extra`
* opencl: use pools for `tensor_extra`
* opencl: fix compiler warnings with GCC and Clang
Still getting the warning about clCreateCmdQueue being obsolete.
Will fix that separately.
* opencl: fail gracefully if opencl devices are not available
Also for unsupported GPUs.
* opencl: fix MSVC builds (string length error)
* opencl: check for various requirements, allow deprecated API
* opencl: update log message for unsupported GPUs
---------
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com>
Co-authored-by: Alexander Angus <quic_aangus@quicinc.com>
Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com>
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>
* Fix crash caused by ggml_backend_load_all when launching on AndroidActivity.
Details:
Calling ggml_backend_load_all during initialization in the AndroidActivity project leads to a crash with the error:
terminating with uncaught exception of type std::__ndk1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied [./].
This issue occurs because AndroidActivity restricts file access due to sandboxing.
Reproduction:
In the example folder, the LlamaAndroid project can reproduce the crash by calling ggml_backend_load_all first in Java_android_llama_cpp_LLamaAndroid_backend_1init.
* Update ggml/src/ggml-backend-reg.cpp
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* feat: load all backends from a user-provided search path
* fix: Windows search path
* refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path`
* refactor: rename `search_path` to `dir_path`
* fix: change `NULL` to `nullptr`
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* fix: change `NULL` to `nullptr`
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency