Dowon
576c82eda2
vocab : add midm-2.0 model pre-tokenizer ( #14626 )
2025-07-11 09:36:04 +02:00
Ryan Mangeno
4bb625b713
Smoldocling support ( #14597 )
...
* support for smoldocling
* fixed merge conflicts
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* merge conflicts
* pre tokenizer merge fix
* convert : fix smollm3 jinja template (#14586 )
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* support for smoldocling
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* fixed merge conflicts
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* safetensors tensor mapping
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* added back accidental removal of clean spaces for hunyuan
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* updated hash and reordererd model list
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update include/llama.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf_update.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* removed old tensor name
* removed tensor mappings -> handled by smolvlm
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net >
2025-07-10 19:41:00 +02:00
Dowon
ffd59e7d18
model : add skt/A.X-4.0 model vocabulary ( #14589 )
2025-07-09 11:22:31 +03:00
ibrahim khadraoui
04655063c4
model : add support for Falcon-H1 family ( #14534 )
...
* v1
* push more fixes
* another fix
* fix
* more fixes
* minor fix
* more cleaning on python code
* python fixes
* changed precision for multipliers float 32->64
* fixes
* another fix
* fix
* pre-norm -> norm
* fix
* Revert "fix"
This reverts commit 243e4d1a50
.
* fix
* small fix ffn_norm
* try
* mix instead of max
* fix vocab size
* conflict solve
* fixed multipliers
* falcon-h1 specefic vocab resolved
* read arch from gguf.MODEL_ARCH
* mamba_d_ssm added to d_inner find_hparam
* remove unused functions from gguf_writer.py
* override modify_tensors instead of get_tensors
* fix conversion and d_inner
* added some cb functions for debugging puposes
* inp_out_ids moved outside of layers loop
* mup_vec create as float64
* fix rope_theta
* injected mup
* clean ups
* rm extra space
* rm unused MAMBA_CHUNK_SIZE
* rm unused key
* add bos False
* changed ROPE_TYPE
* cleaning debugging stuff
* cleaning debug quant
* fix comment
* some cleanups
* some cleanups
* Update src/llama-model-loader.cpp
* more cleanups
* moe cleanuips
* d_ssm -> d_inner;
* cleaning unused hparams
* cleanup
* more cleanups
* more cleanups on python conversion;
* minor cleanups
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* remove todo
* added falcon-h1
* tensor not required
* clean
* remove unneeded attributes
* more cleanups and fixed conversion
* remove final_norm
* flake8 fixes
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* flake8 fixes
* Update src/llama-hparams.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* added hashes
* Update src/llama-arch.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update the update file
* Revert "update the update file"
This reverts commit 082ab4ad2a
.
* fix: address suggestions
* fix: update convert_hf_to_gguf.py
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* d_inner fixed
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* reshaping ssm_norm for 34B
* removing generate_mup
* remove duplicates metadata keys
* rm comment
* final comment
* fix unused args
* fix constants
* fix bad merge
* Update src/llama-model.cpp
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: remove unused ssm_in_b and bad merge
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* falcon-h1: fix last comment
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: revert add_add_bos(False)
* falcon-h1: fix tied weights
* falcon-h1: remove whitespace
* falcon-h1: fix wrong size param
* falcon-h1: fix whitespace issues
---------
Co-authored-by: younesbelkada <younes.belkada@tii.ae >
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net >
2025-07-09 10:03:49 +02:00
Xuan-Son Nguyen
8f22dc0a53
model : add hunyuan moe ( #14425 )
...
* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com >
2025-07-08 11:24:06 +03:00
Sigbjørn Skjæret
22015b2092
lint : remove trailing whitepace ( #14304 )
2025-06-20 16:37:44 +02:00
Ruikai Peng
dd6e6d0b6a
vocab : prevent tokenizer overflow ( #14301 )
...
* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow
2025-06-20 07:13:06 -07:00
Sigbjørn Skjæret
88fc854b4b
llama : improve sep token handling ( #14272 )
2025-06-20 14:04:09 +02:00
fanyang
456af35eb7
build : suppress gcc15 compile warnings ( #14261 )
...
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-19 14:49:48 +02:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model ( #14185 )
...
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-06-16 01:04:06 +02:00
Georgi Gerganov
fb85a288d7
vocab : fix build ( #14175 )
...
ggml-ci
2025-06-13 20:03:05 +03:00
Guy Goldenberg
3cfbbdb44e
Merge commit from fork
...
* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-13 19:20:25 +03:00
Georgi Gerganov
c33fe8b8c4
vocab : prevent heap overflow when vocab is too small ( #14145 )
...
ggml-ci
2025-06-13 08:03:54 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token ( #14022 )
2025-06-05 09:29:18 +02:00
Sigbjørn Skjæret
5e1c3aed40
convert : fix nomic-bert-moe mask token ( #13757 )
2025-06-01 18:07:21 +02:00
Sigbjørn Skjæret
c3a2624339
vocab : fix ugm tokenizer precision ( #13743 )
2025-05-24 12:29:09 +02:00
Johannes Gäßler
10d2af0eaa
llama/ggml: add LLM training support ( #10544 )
...
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
2025-05-12 14:44:49 +02:00
Sigbjørn Skjæret
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder ( #13423 )
2025-05-10 22:08:07 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Mikko Juola
971f245b3b
llama : recognize IBM Granite 3.3 FIM tokens ( #12988 )
...
The Granite's FIM tokens are very similar to Qwen's; it's just that
they use underscore instead of a dash. So <fim_middle> for example
instead of <fim-middle>.
Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base
shows:
```
"<fim_prefix>",
"<fim_middle>",
"<fim_suffix>",
"<fim_pad>",
...
"<reponame>",
```
2025-04-17 11:37:05 +03:00
Yuxuan Zhang
06bb53ad9b
llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )
...
* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8
2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
yumeyao
5dd5d1ab00
vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range ( #12706 )
2025-04-03 18:32:54 +03:00
Sigbjørn Skjæret
83a88bd6af
vocab : BailingMoE : change possessive quantifiers to greedy ( #12677 )
2025-04-02 11:21:48 +02:00
Daniel Bevenius
c80a7759da
vocab : add special infill tokens for CodeLlama ( #11850 )
...
* vocab : add special infill tokens for CodeLlama
The commit adds the following special tokens for CodeLlama infill:
- `▁<PRE>`
- `▁<SUF>`
- `▁<MID>`
The motivation for this is that currently the infill example uses
CodeLlama as a suggested model. But when using this model the following
error is generated:
```console
/llama.cpp-debug/examples/infill/infill.cpp:165: GGML_ASSERT(llama_vocab_fim_pre(vocab) >= 0) failed
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
305251 Aborted (core dumped)
./build/bin/llama-infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf \
-c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 \
--in-prefix "def helloworld():\n print(\"hell" \
--in-suffix "\n print(\"goodbye world\")\n "
```
* squash! vocab : add special infill tokens for CodeLlama
Add _<EOT> as well.
2025-03-31 18:40:56 +02:00
Sigbjørn Skjæret
2c3f8b850a
llama : support BailingMoE (Ling) ( #12634 )
2025-03-30 22:21:03 +02:00
Juyoung Suk
b3de7cac73
llama : add Trillion 7B model support ( #12556 )
...
* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp
2025-03-30 20:38:33 +02:00
compilade
00d53800e0
llama-vocab : add SuperBPE pre-tokenizer ( #12532 )
2025-03-24 11:47:24 +01:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ( #12150 )
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com >
2025-03-04 18:53:26 +02:00
Xuan-Son Nguyen
c43a3e7996
llama : add Phi-4-mini support (supersede #12099 ) ( #12108 )
...
* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com >
2025-02-28 12:44:11 +01:00
mgroeber9110
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )
2025-01-30 12:10:59 +02:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-01-27 14:42:09 +01:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
Georgi Gerganov
bbf3e55e35
vocab : add dummy tokens for "no_vocab" type ( #11231 )
...
* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci]
2025-01-14 11:54:58 +01:00
Daniel Bevenius
8f70fc3d1b
llama : remove 'd' from bad special token log ( #11212 )
...
This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```
2025-01-13 13:38:20 +01:00
Georgi Gerganov
08f10f69c3
llama : remove notion of CLS token ( #11064 )
...
ggml-ci
2025-01-12 12:15:53 +02:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab
, functions -> methods, naming ( #11110 )
...
* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-01-12 11:32:42 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 ( #11049 )
...
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
2025-01-04 21:06:11 +01:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp
( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS ( #10930 )
...
* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment
2024-12-24 09:44:20 +02:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
Georgi Gerganov
08ea539df2
unicode : improve naming style ( #10838 )
...
* unicode : improve naming style
ggml-ci
* cont [no ci]
2024-12-16 12:31:45 +02:00
Riccardo Orlando
6fe6247831
llama : add Minerva 7B model support ( #10673 )
...
* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py
2024-12-05 20:30:59 +02:00
wwoodsTM
ff252ea48e
llama : add DRY sampler ( #9702 )
...
* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com >
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com >
2024-10-25 19:07:34 +03:00
Georgi Gerganov
99bd4ac28c
llama : infill sampling handle very long tokens ( #9924 )
...
* llama : infill sampling handle very long tokens
ggml-ci
* cont : better indices
ggml-ci
2024-10-17 22:32:47 +03:00
Daniel Bevenius
9e04102448
llama : suppress conversion from 'size_t' to 'int' ( #9046 )
...
* llama : suppress conversion from 'size_t' to 'int'
This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:
```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
conversion from 'size_t' to 'int', possible loss of data
```
This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.
The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.
* squash! llama : suppress conversion from 'size_t' to 'int'
Move cast into for loop.
2024-10-16 20:34:28 +03:00
Georgi Gerganov
755a9b2bf0
llama : add infill sampler ( #9896 )
...
ggml-ci
2024-10-15 16:35:33 +03:00
Georgi Gerganov
11ac9800af
llama : improve infill support and special token detection ( #9798 )
...
* llama : improve infill support
ggml-ci
* llama : add more FIM token strings
ggml-ci
* server : update prompt on slot restore (#9800 )
* gguf : deprecate old FIM token KVs
2024-10-12 08:21:51 +03:00