R0CKSTAR
48b86c4fdb
cuda: remove linking to cublasLt ( #14790 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-22 07:45:26 +08:00
Johannes Gäßler
0cf6725e9f
CUDA: FA support for Deepseek (Ampere or newer) ( #13306 )
...
* CUDA: FA support for Deepseek (Ampere or newer)
* do loop unrolling via C++ template
2025-05-09 13:34:58 +02:00
Johannes Gäßler
141a908a59
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF ( #13135 )
2025-05-06 23:35:51 +02:00
Diego Devesa
d7a14c42a1
build : fix build info on windows ( #13239 )
...
* build : fix build info on windows
* fix cuda host compiler msg
2025-05-01 21:48:08 +02:00
Erik Scholz
80c41ddd8f
CUDA: compress mode option and default to size ( #12029 )
...
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
2025-03-01 12:57:22 +01:00
Johannes Gäßler
a28e0d5eb1
CUDA: app option to compile without FlashAttention ( #12025 )
2025-02-22 20:44:34 +01:00
PureJourney
ecc8e3aeff
CUDA: correct the lowest Maxwell supported by CUDA 12 ( #11984 )
...
* CUDA: correct the lowest Maxwell supported by CUDA 12
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-02-21 12:21:05 +01:00
Diego Devesa
94b87f87b5
cuda : add ampere to the list of default architectures ( #11870 )
2025-02-14 15:33:52 +01:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention ( #11583 )
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-02-02 19:31:09 +01:00
Georgi Gerganov
ab96610b1e
cmake : enable warnings in llama ( #10474 )
...
* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci
2024-11-26 14:18:08 +02:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends ( #10469 )
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-11-25 15:13:39 +01:00
Diego Devesa
3ee6382d48
cuda : fix CUDA_FLAGS not being applied ( #10403 )
2024-11-19 14:29:38 +01:00
Diego Devesa
d3481e6316
cuda : only use native when supported by cmake ( #10389 )
2024-11-18 18:43:40 +01:00
Johannes Gäßler
ce2e59ba10
CMake: fix typo in comment [no ci] ( #10360 )
2024-11-17 12:59:38 +01:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec ( #10318 )
2024-11-17 09:09:55 +01:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build ( #10320 )
2024-11-17 09:06:34 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com >
2024-11-14 18:04:35 +01:00