* Bug fix for clamp_f32
When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.
* Bug fix for clamp_f32
* Bug fix for clamp_f32
* server : use common_token_to_piece instead of common_detokenize
This commit replaces the call to common_detokenize with
common_token_to_piece in the populate_token_probs.
The motivation for this change is to avoid an issue where
common_detokenize would remove the word boundary character for tokens,
which caused a regression in the server generated token probabilities.
Resolves: https://github.com/ggerganov/llama.cpp/issues/11728
* squash! server : use common_token_to_piece instead of common_detokenize
Use common_token_to_piece for post_sampling_probs as well.
* server : (webui) introduce conversation branching + idb storage
* mark old conv as "migrated" instead deleting them
* improve migration
* add more comments
* more clarification
Technically the fixed width types come only from iostream and
cstdint/stdint.h headers. memory and vector headers should not provide
these. In GCC 15 the headers are cleaned up and you require the proper
header cstdint.
src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type
26 | uint32_t read_u32() const;
| ^~~~~~~~
* redo Settings modal UI
* add python code interpreter
* fix auto scroll
* build
* fix overflow for long output lines
* bring back sticky copy button
* adapt layout on mobile view
* fix multiple lines output and color scheme
* handle python exception
* better state management
* add webworker
* add headers
* format code
* speed up by loading pyodide on page load
* (small tweak) add small animation to make it feels like claude
After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.
Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function.