Commit Graph

4489 Commits

Author SHA1 Message Date
3952a221af Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) b4139 2024-11-19 23:18:17 +01:00
42ae10bbcd add cmake rvv support (#10411) b4138 2024-11-19 21:10:31 +01:00
9fe0fb0626 sync : ggml b4137 2024-11-19 20:03:21 +02:00
611fabd792 metal : fox offset integer overflows in im2col (ggml/1015)
-- While running StableDiffusion.cpp locally with Metal some offsets overflow and results in incorrect calculations
2024-11-19 20:03:21 +02:00
PAB
12b0ad953a metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) 2024-11-19 20:03:21 +02:00
342397dc7e cmake: force MSVC compiler charset to utf-8 (#9989) b4134 2024-11-19 18:42:00 +01:00
2a11b6b094 Add required ggml-base and backend libs to cmake pkg (#10407) b4133 2024-11-19 17:10:30 +01:00
3ee6382d48 cuda : fix CUDA_FLAGS not being applied (#10403) b4132 2024-11-19 14:29:38 +01:00
8e752a777b llama : add check for KV cache shifts (#10401)
ggml-ci
b4131
2024-11-19 13:29:26 +02:00
a88ad007de llama : add OLMo November 2024 support (#10394)
* Add OLMo November 2024 constants

* Add OLMo November 2024 converter

* Add loading of OLMo November 2024 tensors and hyper parameters

* Add building of OLMo November 2024 model
b4130
2024-11-19 11:04:08 +02:00
2a1507c162 sycl : Add option to set the SYCL architecture for all targets (#10266)
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
b4129
2024-11-19 08:02:23 +00:00
b3e585988f vulkan: Optimize soft_max (#10301)
* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.
b4128
2024-11-19 08:25:17 +01:00
557924f222 sycl: Revert MUL_MAT_OP support changes (#10385) b4127 2024-11-19 08:50:04 +08:00
d3481e6316 cuda : only use native when supported by cmake (#10389) b4126 2024-11-18 18:43:40 +01:00
531cb1c233 Skip searching root path for cross-compile builds (#10383) 2024-11-18 16:23:58 +01:00
f139d2ea61 vulkan: remove use of null initializer (#10372)
Seems like this isn't working for vulkan-over-metal when the array is sized
by a spec constant. Maybe a spirv-cross limitation?
2024-11-18 08:28:42 -06:00
2eb76b2a5e flake.lock: Update (#10346)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05)
  → 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-11-18 06:08:20 -08:00
9b75f03cd2 Vulkan: Fix device info output format specifiers (#10366)
* Vulkan: Fix device info output format specifiers

* Vulkan: Use zu printf specifier for size_t instead of ld
b4122
2024-11-18 11:02:43 +01:00
75207b3a88 docker: use GGML_NATIVE=OFF (#10368) 2024-11-18 00:21:53 +01:00
76e9e58b78 CUDA: fix MMV kernel being used for FP16 src1 (#10357) b4120 2024-11-17 23:20:42 +01:00
ce2e59ba10 CMake: fix typo in comment [no ci] (#10360) 2024-11-17 12:59:38 +01:00
be5caccef9 llama : only use default buffer types for the KV cache (#10358) b4118 2024-11-17 12:25:45 +01:00
20a780c7b6 gitignore : ignore local run scripts [no ci] 2024-11-17 13:12:22 +02:00
cf32a9b93a metal : refactor kernel args into structs (#10238)
* metal : add kernel arg structs (wip)

* metal : fattn args

ggml-ci

* metal : cont + avoid potential int overflow [no ci]

* metal : mul mat struct (wip)

* cont : mul mat vec

* cont : pass by reference

* cont : args is first argument

* cont : use char ptr

* cont : shmem style

* cont : thread counters style

* cont : mul mm id

ggml-ci

* cont : int safety + register optimizations

ggml-ci

* metal : GGML_OP_CONCAT

ggml-ci

* metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV

* metal : GGML_OP_REPEAT

* metal : GGML_OP_CPY

* metal : GGML_OP_RMS_NORM

* metal : GGML_OP_NORM

* metal : add TODOs for rest of ops

* ggml : add ggml-metal-impl.h

ggml-ci
2024-11-17 11:23:01 +02:00
a43178299c ggml : fix undefined reference to 'getcpu' (#10354)
https://github.com/ggerganov/llama.cpp/issues/10352
b4115
2024-11-17 10:39:22 +02:00
c3ea58aca4 CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) b4114 2024-11-17 09:09:55 +01:00
467576b6cc CMake: default to -arch=native for CUDA build (#10320) b4113 2024-11-17 09:06:34 +01:00
eda7e1d4f5 ggml : fix possible buffer use after free in sched reserve (#9930) b4112 2024-11-17 08:31:17 +02:00
24203e9dd7 ggml : inttypes.h -> cinttypes (#0)
ggml-ci
b4111
2024-11-17 08:30:29 +02:00
5d9e59979c ggml : adapt AMX to tensor->grad removal (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
a4200cafad make : add ggml-opt (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
84274a10c3 tests : remove test-grad0 2024-11-17 08:30:29 +02:00
68fcb4759c ggml : fix compile warnings (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
8a43e940ab ggml: new optimization interface (ggml/988) 2024-11-17 08:30:29 +02:00
5c9a8b22b1 scripts : update sync 2024-11-17 08:30:29 +02:00
0fff7fd798 docs : vulkan build instructions to use git bash mingw64 (#10303) 2024-11-17 00:29:18 +01:00
4e54be0ec6 llama/ex: remove --logdir argument (#10339) b4103 2024-11-16 23:00:41 +01:00
db4cfd5dbc llamafile : fix include path (#0)
ggml-ci
b4102
2024-11-16 20:36:26 +02:00
8ee0d09ae6 make : auto-determine dependencies (#0) 2024-11-16 20:36:26 +02:00
bcdb7a2386 server: (web UI) Add samplers sequence customization (#10255)
* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b4100
2024-11-16 14:26:54 +01:00
f245cc28d4 scripts : fix missing key in compare-llama-bench.py (#10332) 2024-11-16 10:32:50 +02:00
772703c8ff vulkan: Optimize some mat-vec mul quant shaders (#10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
b4098
2024-11-16 07:26:57 +01:00
dd3a6ce9f8 vulkan : add cmake preset debug/release (#10306) 2024-11-16 02:59:33 +01:00
1e58ee1318 ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) b4096 2024-11-16 01:53:37 +01:00
89e4caaaf0 llama : save number of parameters and the size in llama_model (#10286)
fixes #10285
b4095
2024-11-16 01:42:13 +01:00
74d73dc85c Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) b4094 2024-11-15 22:27:00 +01:00
4047be74da scripts: update compare-llama-bench.py (#10319) b4093 2024-11-15 21:19:03 +01:00
883d206fbd ggml : fix some build issues b4092 2024-11-15 21:45:32 +02:00
09ecbcb596 cmake : fix ppc64 check (whisper/0)
ggml-ci
b4091
2024-11-15 15:44:06 +02:00
3225008973 ggml : vulkan logs (whisper/2547) 2024-11-15 15:44:06 +02:00