Files
openclaw/docs/reference/transcript-hygiene.md

5.3 KiB

summary, read_when, title
summary read_when title
Reference: provider-specific transcript sanitization and repair rules
You are debugging provider request rejections tied to transcript shape
You are changing transcript sanitization or tool-call repair logic
You are investigating tool-call id mismatches across providers
Transcript Hygiene

Transcript Hygiene (Provider Fixups)

This document describes provider-specific fixes applied to transcripts before a run (building model context). These are in-memory adjustments used to satisfy strict provider requirements. These hygiene steps do not rewrite the stored JSONL transcript on disk; however, a separate session-file repair pass may rewrite malformed JSONL files by dropping invalid lines before the session is loaded. When a repair occurs, the original file is backed up alongside the session file.

Scope includes:

  • Tool call id sanitization
  • Tool call input validation
  • Tool result pairing repair
  • Turn validation / ordering
  • Thought signature cleanup
  • Image payload sanitization
  • User-input provenance tagging (for inter-session routed prompts)

If you need transcript storage details, see:


Where this runs

All transcript hygiene is centralized in the embedded runner:

  • Policy selection: src/agents/transcript-policy.ts
  • Sanitization/repair application: sanitizeSessionHistory in src/agents/pi-embedded-runner/google.ts

The policy uses provider, modelApi, and modelId to decide what to apply.

Separate from transcript hygiene, session files are repaired (if needed) before load:

  • repairSessionFileIfNeeded in src/agents/session-file-repair.ts
  • Called from run/attempt.ts and compact.ts (embedded runner)

Global rule: image sanitization

Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images).

Implementation:

  • sanitizeSessionMessagesImages in src/agents/pi-embedded-helpers/images.ts
  • sanitizeContentBlocksImages in src/agents/tool-images.ts

Global rule: malformed tool calls

Assistant tool-call blocks that are missing both input and arguments are dropped before model context is built. This prevents provider rejections from partially persisted tool calls (for example, after a rate limit failure).

Implementation:

  • sanitizeToolCallInputs in src/agents/session-transcript-repair.ts
  • Applied in sanitizeSessionHistory in src/agents/pi-embedded-runner/google.ts

Global rule: inter-session input provenance

When an agent sends a prompt into another session via sessions_send (including agent-to-agent reply/announce steps), OpenClaw persists the created user turn with:

  • message.provenance.kind = "inter_session"

This metadata is written at transcript append time and does not change role (role: "user" remains for provider compatibility). Transcript readers can use this to avoid treating routed internal prompts as end-user-authored instructions.

During context rebuild, OpenClaw also prepends a short [Inter-session message] marker to those user turns in-memory so the model can distinguish them from external end-user instructions.


Provider matrix (current behavior)

OpenAI / OpenAI Codex

  • Image sanitization only.
  • On model switch into OpenAI Responses/Codex, drop orphaned reasoning signatures (standalone reasoning items without a following content block).
  • No tool call id sanitization.
  • No tool result pairing repair.
  • No turn validation or reordering.
  • No synthetic tool results.
  • No thought signature stripping.

Google (Generative AI / Gemini CLI / Antigravity)

  • Tool call id sanitization: strict alphanumeric.
  • Tool result pairing repair and synthetic tool results.
  • Turn validation (Gemini-style turn alternation).
  • Google turn ordering fixup (prepend a tiny user bootstrap if history starts with assistant).
  • Antigravity Claude: normalize thinking signatures; drop unsigned thinking blocks.

Anthropic / Minimax (Anthropic-compatible)

  • Tool result pairing repair and synthetic tool results.
  • Turn validation (merge consecutive user turns to satisfy strict alternation).

Mistral (including model-id based detection)

  • Tool call id sanitization: strict9 (alphanumeric length 9).

OpenRouter Gemini

  • Thought signature cleanup: strip non-base64 thought_signature values (keep base64).

Everything else

  • Image sanitization only.

Historical behavior (pre-2026.1.22)

Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene:

  • A transcript-sanitize extension ran on every context build and could:
    • Repair tool use/result pairing.
    • Sanitize tool call ids (including a non-strict mode that preserved _/-).
  • The runner also performed provider-specific sanitization, which duplicated work.
  • Additional mutations occurred outside the provider policy, including:
    • Stripping <final> tags from assistant text before persistence.
    • Dropping empty assistant error turns.
    • Trimming assistant content after tool calls.

This complexity caused cross-provider regressions (notably openai-responses call_id|fc_id pairing). The 2026.1.22 cleanup removed the extension, centralized logic in the runner, and made OpenAI no-touch beyond image sanitization.