llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-06-27 03:55:20 +00:00

Author	SHA1	Message	Date
Gabe Goodhart	d590cd4c24	model : Granite MoE shared (#13269 ) * feat: Add GGUF conversion for granitemoeshared Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: hparam and arch plumbing for granitemoeshared Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Split MoE fused tensors for shared experts in conversion Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: First WIP cut at model arch in cpp The hparam and architecture plumbing should be correct, but the implementation of the shared experts seems to still be broken. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Cleaner (maybe more correct?) splitting for gate/up Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Fix the input to the shared experts I had misread that the shared experts take the inputs _before_ the standard MoE layer and was feeding the output of the MoE to the shared experts. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Avoid architecture-specific checks for Granite MoE Shared This is a cleaner way that will allow more flexibility in architecture strings going forward. Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Split granite architectures out of llm_build_llama This helps de-clutter the llama-family graph construction and allows granite to diverge further (in preparation for Granite 4). NOTE: I removed the granite scale factors from llm_build_deci because they appear to only be there as copy-paste from llm_build_llama. The HF config does not seem to set those values: https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Fix compiler warning about uninitialized inp_pos This should not have been reachable, but it warns on some compliers Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Consoladate GraniteMoEShared into GraniteMoE for conversion Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side Branch: GraniteMoEShared Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-05-13 15:12:01 +02:00
City	3eac209319	mtmd : support InternVL 3 38B and 78B mmproj (#13443 ) * Support InternVL 3 38B and 78B mmproj * Swap norms in clip.cpp * Group variables together	2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen	053367d149	mtmd : support InternVL 2.5 and 3 (#13422 ) * convert : internvl support * InternVL3-1B working * fix regression * rm mobilevlm from test * fix conversion * add test for internvl * add to list of pre-quant * restore boi/eoi check * add clarify comment for norm eps	2025-05-10 16:26:42 +02:00
compilade	a7366faa5b	gguf-py : avoid requiring pyside6 for other scripts (#13036 ) - gguf-py : remove gguf-py/gguf/scripts/__init__.py because it's not needed Implicit namespaces are supported since Python 3.3 (https://peps.python.org/pep-0420/), and the entrypoints in pyproject.toml can directly refer to the main functions.	2025-05-05 22:27:31 -04:00
Xuan-Son Nguyen	5215b91e93	clip : fix confused naming ffn_up and ffn_down (#13290 ) * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff	2025-05-05 12:54:44 +02:00
Jared Van Bortel	2f567611c0	llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245 )	2025-05-02 11:42:30 -04:00
Xuan-Son Nguyen	074e42ab31	convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209 ) * wip * qwen2.5vl ok * vision: fix models missing "text_config" * add test * fix test repo name * fix 32B model * Revert "fix 32B model" This reverts commit `651752f1ae`. * clarify about 32B * rm qwen surgery script * update llava/readme * move V_ENC_EMBD_PATCH handling to Qwen2VLVisionModel	2025-05-02 17:17:15 +02:00
Xuan-Son Nguyen	8936784f7a	mtmd : add vision support for Mistral Small 3.1 (#13231 ) * convert ok * load ok, missing patch merger * ah sheet it works * update llava/readme * add test * fix test	2025-05-01 17:05:42 +02:00
AT	5f5e39e1ba	model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466 ) * Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture - Adds MoE-based embedding model supporting multilingual embeddings. - Selects architecture variant based on hyperparameter detection (MoE layers). - Removes unnecessary subclass initialization checks for clarity. https://www.nomic.ai/blog/posts/nomic-embed-text-v2 Co-authored-by: Jared Van Bortel <jared@nomic.ai> * fix tokenizer * don't rename this tensor --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2025-04-28 22:52:15 +03:00
Xuan-Son Nguyen	ecda2ec4b3	mtmd : Support Pixtral 12B (#13065 ) * add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script	2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen	dc39a5e7a8	mtmd : support SmolVLM (version 1 and 2) (#13050 ) * mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test	2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen	2016f07bd1	convert : experimental support for `--mmproj` flag (#13023 ) * convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> * fix Mistral3Model * fix typo Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2025-04-20 23:29:36 +02:00
Sigbjørn Skjæret	fb28f4f80e	gguf-py : fix upload python package workflow (#13020 )	2025-04-19 16:26:38 +02:00
Chris Thompson	aff9d107b0	gguf-py : GGUF Editor GUI - Python + Qt6 (#12930 )	2025-04-18 20:30:41 +02:00
Juk Armstrong	daa422881a	llama : DeepSeek V2/V3 MLA implementation (#12801 ) * Merged using squash to remove all noise commit messages * Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large * Removed 3 conts (2x RoPE and 1x RMS-norm) * Changed to use `<cmath>` instead of `<math.h>` * Reverted removal of the 3 conts * Used `reshape` in `llm_graph_context::build_attn_mha()` * Use `k_pe = ggml_reshape` * Removed the 3 conts again * Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF * Removed MQA optimisation from `build_attn_mha()` as no gains now * Simplified `is_mla` branch in `llm_build_deepseek2()` * Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls * Fixed call to `build_attn` in `llm_build_t5_enc`	2025-04-15 09:49:57 +03:00
Yuxuan Zhang	06bb53ad9b	llama-model : add Glm4Model implementation for GLM-4-0414 (#12867 ) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8	2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen	5b1f13cb64	convert : proper tensor name mapping for llama4 (#12870 ) * Llama-4 mapping * remove hacky renaming --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-04-11 09:23:37 +02:00
Xuan-Son Nguyen	64eda5deb9	convert : ability to lazy-load safetensors remotely without downloading to disk (#12820 ) * gguf util : add SafetensorRemote * fix style * convert: add --remote option * convert : allow using lazy remote tensors It's a bit slow for now since everything is blocking and single-threaded. * correct metadata.name * small style fix * support HF_TOKEN * convert : use writeable buffer for remote lazy tensors * convert : fix flake8 lint regarding lamdba assigment * multithreaded download * multithread: print debug * fix style * Revert "multithreaded download" This reverts commit `42fc895ace`. * bring back _get_request_headers --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2025-04-10 17:24:44 +02:00
Bo Zheng	d3bd7193ba	llama : Support Qwen3 and Qwen3MoE (#12828 ) * add qwen3 & qwen3moe support. * fix --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>	2025-04-09 11:47:36 +02:00
compilade	a226bc7a9a	gguf-py : support lazy tensor splitting (#12809 ) * gguf-py : support lazy tensor splitting Splitting usually involves returning tuples of tensors, which need to be handled properly to avoid early eager evaluation. * gguf-py : fix flake8 lint	2025-04-08 09:03:07 +02:00
Xuan-Son Nguyen	1466621e73	llama : Support llama 4 text-only (#12791 ) * llama4 conversion * initial support, no chat template * clean up a bit * fix tokenizer conversion * correct hparams * try this * fix shexp * ffn_inp_normed * chat template * clean up model conversion * add_bos * add scale_before_ffn * fix order * weight_before_ffn * llm_graph_input_attn_temp * add chunk attn mask * build_inp_attn_scale() * add comment about ggml_repeat * clarify comments * fix build	2025-04-07 23:06:44 +02:00
Sigbjørn Skjæret	2c3f8b850a	llama : support BailingMoE (Ling) (#12634 )	2025-03-30 22:21:03 +02:00
Si1w	f125b8dccf	llama : add PLM GGUF Conversion & Inference Support (#12457 ) * add edgellm model arch[conversation feature doesn't work] * remove output.weight layer for edgellm arch * [Model] update the name of the model * update the name of model arch in convert gguf * [Model] Refarctor the model arch into llama-model * [Bug] Fix the bug in create attn kv * [Code] Fix editorconfig erros * [Code] Remove Trailing whitespace * [Code] Remove Trailing whitespace * [Code] Change the order of model arch in list * [Code] Fix flake8 Lint errors * Remove trailing white space * [Code] Remove call in model arch	2025-03-27 12:49:15 +02:00
Xuan-Son Nguyen	fbdfefe74e	llama : gemma3 : use output tensor if it exists in model weight (#12506 ) * llama : gemma3 : use output tensor if it exists in model weight * also add to the llm_tensor_names	2025-03-22 23:28:19 +01:00
Sigbjørn Skjæret	a686171ea7	convert : Support chat_template.json (#12460 )	2025-03-19 08:58:13 +01:00
Molly Sophia	7dfad387e3	llama: Add support for RWKV v7 architecture (#12412 ) * ggml: Add op l2_norm Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Apply code-format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * fix MUSA build Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-18 07:27:50 +08:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen	06c2b1561d	convert : fix Norway problem when parsing YAML (#12114 ) * convert : fix Norway problem when parsing YAML * Update gguf-py/gguf/metadata.py * add newline at correct place	2025-02-28 17:44:46 +01:00
Sigbjørn Skjæret	69050a11be	Refactor gguf scripts to improve metadata handling (#11909 ) * Refactor gguf scripts to improve metadata handling Added contents method to ReaderField class Added endianess property to GGUFReader class * update scripts * fix import * remove unused import * attempt to work around flake and pyright errors * second attempt * give up, ignore type * bump version * apply newbyteorder fixes	2025-02-26 08:04:48 -05:00
Aleksei Nikiforov	3567ee3a94	gguf-py: enable reading non-native endian files (#12081 ) Currently self.byte_order is never used. Actually use it to byteswap read data to allow reading big endian files on little endian systems and vice versa. Now it's possible to convert little-endian model into a big-endian model and back on a little-endian system.	2025-02-26 11:39:27 +00:00
Aleksei Nikiforov	651adf4b66	gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349 )	2025-02-24 11:27:01 +00:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Georgi Gerganov	08f10f69c3	llama : remove notion of CLS token (#11064 ) ggml-ci	2025-01-12 12:15:53 +02:00
Vinesh Janarthanan	c05e8c9934	gguf-py: fixed local detection of gguf package (#11180 ) * updated path to gguf package for non-installed setups * added reader.py to readme * Bumped gguf version to 0.15.0	2025-01-11 11:42:31 +02:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00
Pierrick Hymbert	f8feb4b01a	model: Add support for PhiMoE arch (#11003 ) * model: support phimoe * python linter * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: add phimoe as supported model ggml-ci --------- Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>	2025-01-09 11:21:41 +01:00
Vinesh Janarthanan	8a1d9c25fa	gguf-py : move scripts directory (#11116 ) * Moved scripts dir and fixed pyproject.toml * updated readme * fixed README urls * bump pypi gguf to v0.14.0 * retrigger ci * empty commit - trigger ci	2025-01-08 20:54:58 +02:00
fairydreaming	9394bbd484	llama : Add support for DeepSeek V3 (#11049 ) * convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type * vocab : add DeepSeek V3 pre-tokenizer regexes * unicode : handle ACCENT_MARK and SYMBOL categories in regex * llama : add DeepSeek V3 chat template, handle new model parameters and tensor types --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-01-04 21:06:11 +01:00
DAN™	46be942214	llama : add support for the cohere2 model architecture (#10900 )	2025-01-04 16:33:31 +02:00
ymcki	6f0c9e034b	llama : support for Llama-3_1-Nemotron-51B (#10669 ) * conflict resolution * move comments after bracket to its own line	2024-12-23 01:22:33 +01:00
Georgi Gerganov	0bf2d10c55	tts : add OuteTTS support (#10784 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder	2024-12-18 19:27:21 +02:00
Valentin Mamedov	a0974156f3	llama : add Deepseek MoE v1 & GigaChat models (#10827 ) * Add deepseek v1 arch & gigachat template * improve template code * add readme * delete comments * remove comment * fix format * lint llama.cpp * fix order of deepseek and deepseek2, move gigachat temlate to the end of func * fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need * remove comments * move deepseek above deepseek2 * change placement of gigachat chat template	2024-12-15 19:02:46 +02:00
Georgi Gerganov	b5ae1ddff9	gguf-py : bump to v0.13.0	2024-12-15 13:16:42 +02:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Jett Janiak	4601a8bb67	gguf-py : numpy 2 newbyteorder fix (#9772 )	2024-12-13 16:48:44 +02:00
Georgi Gerganov	fb18934a97	gguf-py : bump version to 0.11.0	2024-12-11 23:13:31 +02:00
Robert Collins	62e84d9848	llama : add 128k yarn context for Qwen (#10698 ) * add 128k yarn context for Qwen * added property for model tensors * removing useless line	2024-12-07 23:12:27 +02:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
Georgi Gerganov	c5ede3849f	convert : add custom attention mapping	2024-12-06 21:33:49 +02:00

1 2 3 4 5

206 Commits