* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
* Prefilling assistant message in openai compatible API
* fixed indentation
* fixed code convention
* simplify method usage
* no more than one assistant message at end of messages
* merge checks into prefill code
* Update examples/server/utils.hpp
---------
Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* llava : add clip_n_output_tokens, deprecate clip_n_patches
* mtmd : add qwen2vl and qwen2.5vl
* decode_embd_batch::set_position_...
* working version
* deprecate llama-qwen2vl-cli
* correct order W, H of clip_embd_nbytes_by_img
* edit existing line in hot topics
* clip : refactor set input for cgraph
* more strict assert
* minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere
* split qwen2 and qwen2.5 code blocks
* minor style fix
* Add --override-tensors option to llama-bench
* Correct llama-bench --override-tensors to --override-tensor
* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.
* Make new llama-bench util functions static to fix Ubuntu CI
* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
* implment vision model architecture, gguf convertor
* handle window attention inputs
* add debug utils
* fix few incorrect tensor memory layout
* move position id remap out of ggml to avoid int32 cuda operations
* cleaning up
* ignore transformers Qwen2_5_xxx type check
* remove not so often use `qwen2vl-cli` debug functions
* remove commented-out code blocks
* fix attn weight scaling after rebase
* add `PROJECTOR_TYPE_QWEN2_5_VL`
* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`
* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`
* remove `attn_window_size` from gguf
* fix model conversion
* clean up
* fix merging problem
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* cmake : do not include ./src as public for libllama
ggml-ci
* cmake : rework tests
ggml-ci
* llguidance : remove unicode include
ggml-ci
* cmake : make c++17 private
ggml-ci
* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb.
* handle no_mmproj explicitly
* skip download mmproj on examples not using it
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This restores the behavior from #491. This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040).
This also actually implements #587: "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."
Fixes#12949
* server : use std::move whenever possible
* use r-value ref
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* make task creation scoped
* restore std::move
* fix task_id not set correctly
* apply changes from suggestion
Co-authored-by: ggerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* fix prompt_modified
* rm redundant data member