eaea325324
clip : fix model size display ( #13153 )
2025-04-28 21:23:19 +02:00
5fa9e63be8
clip : refactor set input for cgraph + fix qwen2.5vl input ( #13136 )
...
* clip : refactor set input for cgraph
* more strict assert
* minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere
* split qwen2 and qwen2.5 code blocks
* minor style fix
2025-04-28 12:18:59 +02:00
59e991c23c
Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete ( #13133 )
2025-04-27 12:43:37 +02:00
ca2bb89eac
clip : Add Qwen2.5VL support ( #12402 )
...
* implment vision model architecture, gguf convertor
* handle window attention inputs
* add debug utils
* fix few incorrect tensor memory layout
* move position id remap out of ggml to avoid int32 cuda operations
* cleaning up
* ignore transformers Qwen2_5_xxx type check
* remove not so often use `qwen2vl-cli` debug functions
* remove commented-out code blocks
* fix attn weight scaling after rebase
* add `PROJECTOR_TYPE_QWEN2_5_VL`
* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`
* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`
* remove `attn_window_size` from gguf
* fix model conversion
* clean up
* fix merging problem
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-04-27 10:10:34 +02:00
4753791e70
clip : improve projector naming ( #13118 )
...
* clip : improve projector naming
* no more kv has_llava_projector
* rm unused kv
* rm more unused
2025-04-26 22:39:47 +02:00
edb18b6e8f
clip : fix pixtral on some GPU backends ( #13097 )
...
* clip : fix pixtral on some GPU backends
* refactor inp_raw set
* rm outdated comment
* fix dynamic size
* add TODO
2025-04-25 14:31:42 +02:00
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model ( #13081 )
2025-04-24 22:17:04 +02:00
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) ( #13050 )
...
* mtmd : support SmolVLM (version 1 and 2)
* correct chat template
* fix n_patches
* scale_factor is an int
* add more models to test
2025-04-22 16:24:54 +02:00
37b9f0d29d
clip : refactor, add image_manipulation
and llava_uhd
classes ( #13011 )
...
* clip : refactor, add `image_manipulation` and `llava_uhd`
* refactor llava-1.6 preprocessing
* simplify logic for llava-1.5
* missing include
2025-04-19 09:15:45 +02:00
e59ea539b8
llava: Fix cpu-only clip image encoding sefault ( #12907 )
...
* llava: Fix cpu-only clip image encoding
* clip : no smart ptr for ggml_backend_t
* Fix for backend_ptr push_back
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-04-12 07:29:03 +02:00
0c50923944
clip : use smart pointer ( ⚠️ breaking change) ( #12869 )
...
* clip : use smart pointers
* fix warmup
* add forward declaration
* misisng include
* fix include (2)
* composite
* simplify batch ptr
* fix conflict
2025-04-11 12:09:39 +02:00
8b9cc7cdd8
llava : introduce libmtmd ( #12849 )
...
* wip llava2
* migrated gemma3 to llava2
* add timings
* correct pre/postfix
* fix missing include
* fix compilation unused var warn
* update llava2_tokenize
* change name llava2 --> mtmd
* improve api
* refine helpers
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-04-10 22:57:16 +02:00
65a69e6e1b
clip : do not print ftype ( #12832 )
2025-04-09 10:09:53 +02:00
b32efad2bc
llava: improve clip_ctx destructor to not memleak load_image_size ( #12834 )
2025-04-08 22:01:58 +02:00
2dabf759e7
llava: add more helper functions to check projector types in clip context ( #12824 )
...
Signed-off-by: dm4 <sunrisedm4@gmail.com >
2025-04-08 15:49:13 +02:00
0364178ca2
clip : refactor clip_init, add tests ( #12757 )
...
* refactor clip_init
* fix loading file
* fix style
* test ok
* better test with report
* add missing headers
* clarify
* add KEY_MM_PATCH_MERGE_TYPE
* remove bool has_* pattern
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update examples/llava/clip.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* use ggml_soft_max_ext
* refactor logging system
* add minicpm-v-o 2.6 for testing
* use nullptr everywhere
* fix Yi-VL model
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-04-05 17:17:40 +02:00
1a85949067
llava : proper description fix ( #12668 )
2025-03-31 11:28:30 +02:00
f52d59d771
llava : fix clip loading GGUFs with missing description ( #12660 )
2025-03-31 11:07:07 +02:00
02082f1519
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend ( #12566 )
...
* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.
* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
2025-03-26 15:06:04 +01:00
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
2025-03-11 09:20:16 +01:00
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
7a2c913e66
llava : Add Granite Vision Support ( #11794 )
...
* Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use current defaults for feature layer
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use constant for max gridpoints / feat layers, style fixes
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* clarify non-negative feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Remove CLIP_API from func signature
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Clarify feature layers are non negative ints and not uint
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix condition for reading feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* pop last llava layer when feature layers are unset
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix unset vision layer 0
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Update examples/llava/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Reenable assertion for out of bounds get_rows
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use std vector for gridpoints and feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Caculate max feature layer at load time
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Include base patch for granite vision allocation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Fix trailing whitespace
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Add max num patches = 10 back for minicpmv
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use unordered set to store feature layers
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Use max feature layer for postnorm
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
* Apply suggestions from code review
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-02-24 17:09:51 +01:00
36c258ee92
llava: build clip image from pixels ( #11999 )
...
* llava: export function `clip_build_img_from_pixels` to build image from pixels decoded by other libraries instead of stb_image.h for better performance
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-02-22 15:28:28 +01:00
ee02ad02c5
clip : fix visual encoders with no CLS ( #11982 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-02-21 08:11:03 +02:00
1ec208083c
llava: add quantization for the visual projector LLAVA, Qwen2VL ( #11644 )
...
* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file
* Fixed the gcc warning regarding minor linting
* Removed trailing whitespace
2025-02-05 10:45:40 +03:00
0cec062a63
llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )
...
* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-02 09:48:46 +02:00
3e3357fd77
llava : support Minicpm-omni ( #11289 )
...
* init
* add readme
* update readme
* no use make
* update readme
* update fix code
* fix editorconfig-checker
* no change convert py
* use clip_image_u8_free
2025-01-22 09:35:48 +02:00
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
d408bb9268
clip : disable GPU support ( #10896 )
...
ggml-ci
2024-12-19 18:47:15 +02:00
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-12-14 14:43:46 +02:00
01e6d9bb71
clip : add sycl support ( #10574 )
...
Co-authored-by: piDack <pcdack@hotmail.co >
2024-12-04 01:26:37 +01:00
678d7994f4
llava: return false instead of exit ( #10546 )
2024-11-29 01:09:46 +01:00
9f40989351
ggml : move CPU backend to a separate file ( #10144 )
2024-11-03 19:34:08 +01:00
cad341d889
metal : reduce command encoding overhead ( #9698 )
...
* metal : reduce command encoding overhead
ggml-ci
* metal : add comments
2024-10-01 16:00:25 +03:00
6262d13e0b
common : reimplement logging ( #9418 )
...
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
d6a04f872d
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ( #9408 )
...
* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set
ggml-ci
* ggml : add ggml-impl.h to backends
* ggml : fix compiler warnings
ggml-ci
* ggml : add assert upon adding nodes
2024-09-12 14:23:49 +03:00
7ea8d80d53
llava : the function "clip" should be int ( #9237 )
2024-08-30 07:21:57 +02:00
436787f170
llama : fix time complexity of string replacement ( #9163 )
...
This change fixes a bug where replacing text in a very long string could
cause llama.cpp to hang indefinitely. This is because the algorithm used
was quadratic, due to memmove() when s.replace() is called in a loop. It
seems most search results and LLM responses actually provide the O(n**2)
algorithm, which is a great tragedy. Using a builder string fixes things
2024-08-26 09:09:53 +03:00
f63f603c87
llava : zero-initialize clip_ctx structure fields with aggregate initialization 908)
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
2024-08-21 09:45:49 +02:00
2f3c1466ff
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. ( #8984 )
...
* llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model.
- The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
- A GGML_OP_ACC shader has been added.
- The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
* fix-up coding style.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
* Fix-up the missing initial parameter to resolve the compilation warning.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
* [fix] Add missing parameters.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
* [fix] Use nb1 and nb2 for dst.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
* Fix check results ggml_acc call
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com >
Co-authored-by: 0cc4m <picard12@live.de >
2024-08-20 21:00:00 +02:00
d565bb2fd5
llava : support MiniCPM-V-2.6 ( #8967 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* support minicpmv2.6
* modify convert script of minicpmv
* modify convert
* modify convert
* add readme
* add resampler of v2.6
* modify clip
* modify readme
* fix type-check
* fix type-check
* fix type-check
* fix type-check
* modify convert script and readme
* fix convert script and readme
* fix convert
* fix num in convert
* fix type-check
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com >
Co-authored-by: harvestingmoon <leewenyeong@gmail.com >
2024-08-16 16:34:41 +03:00
45a55b91aa
llama : better replace_all (cont) ( #8926 )
...
* llama : better replace_all (cont)
ggml-ci
* code : deduplicate replace_all
ggml-ci
2024-08-09 18:23:52 +03:00
3071c0a5f2
llava : support MiniCPM-V-2.5 ( #7599 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* clip : style changes
* del common.h in clip
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix makefile error
* fix ubuntu-make error
* try fix clip
* try fix 1
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com >
Co-authored-by: harvestingmoon <leewenyeong@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2024-08-09 13:33:53 +03:00
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
1bdd8ae19f
[CANN] Add Ascend NPU backend ( #6035 )
...
* [CANN] Add Ascend NPU backend
Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.
CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.
Co-authored-by: wangshuai09 <391746016@qq.com >
* delete trailing whitespaces
* Modify the code based on review comment
* Rename LLAMA_CANN to GGML_CANN
* Make ggml-common.h private
* add ggml_cann prefix for acl funcs
* Add logging for CANN backend
* Delete Trailing whitespace
---------
Co-authored-by: wangshuai09 <391746016@qq.com >
2024-07-17 14:23:50 +03:00
9b31a40c6d
clip : suppress unused variable warnings ( #8105 )
...
* clip : suppress unused variable warnings
This commit suppresses unused variable warnings for the variables e in
the catch blocks.
The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com >
* squash! clip : suppress unused variable warnings
Remove e (/*e*/) instead instead of using GGML_UNUSED.
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com >
2024-06-27 01:50:09 +02:00
d11afd6652
llava : fix moondream support ( #7163 )
...
* Revert "Revert "llava : add support for moondream vision language model (#6899 )""
This reverts commit 9da243b36a
.
* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00