* fix(browser): prevent permanent timeout after stuck evaluate Thread AbortSignal from client-fetch through dispatcher to Playwright operations. When a timeout fires, force-disconnect the Playwright CDP connection to unblock the serialized command queue, allowing the next call to reconnect transparently. Key changes: - client-fetch.ts: proper AbortController with signal propagation - pw-session.ts: new forceDisconnectPlaywrightForTarget() - pw-tools-core.interactions.ts: accept signal, align inner timeout to outer-500ms, inject in-browser Promise.race for async evaluates - routes/dispatcher.ts + types.ts: propagate signal through dispatch - server.ts + bridge-server.ts: Express middleware creates AbortSignal from request lifecycle - client-actions-core.ts: add timeoutMs to evaluate type Fixes #10994 * fix(browser): v2 - force-disconnect via Connection.close() instead of browser.close() When page.evaluate() is stuck on a hung CDP transport, browser.close() also hangs because it tries to send a close command through the same stuck pipe. v2 fix: forceDisconnectPlaywrightForTarget now directly calls Playwright's internal Connection.close() which locally rejects all pending callbacks and emits 'disconnected' without touching the network. This instantly unblocks all stuck Playwright operations. closePlaywrightBrowserConnection (clean shutdown) now also has a 3s timeout fallback that drops to forceDropConnection if browser.close() hangs. Fixes permanent browser timeout after stuck evaluate. * fix(browser): v3 - fire-and-forget browser.close() instead of Connection.close() v2's forceDropConnection called browser._connection.close() which corrupts the entire Playwright instance because Connection is shared across all objects (BrowserType, Browser, Page, etc.). This prevented reconnection with cascading 'connectOverCDP: Force-disconnected' errors. v3 fix: forceDisconnectPlaywrightForTarget now: 1. Nulls cached connection immediately 2. Fire-and-forgets browser.close() (doesn't await — it may hang) 3. Next connectBrowser() creates a fresh connectOverCDP WebSocket Each connectOverCDP creates an independent WebSocket to the CDP endpoint, so the new connection is unaffected by the old one's pending close. The old browser.close() eventually resolves when the in-browser evaluate timeout fires, or the old connection gets GC'd. * fix(browser): v4 - clear connecting state and remove stale disconnect listeners The reconnect was failing because: 1. forceDisconnectPlaywrightForTarget nulled cached but not connecting, so subsequent calls could await a stale promise 2. The old browser's 'disconnected' event handler raced with new connections, nulling the fresh cached reference Fix: null both cached and connecting, and removeAllListeners on the old browser before fire-and-forget close. * fix(browser): v5 - use raw CDP Runtime.terminateExecution to kill stuck evaluate When forceDisconnectPlaywrightForTarget fires, open a raw WebSocket to the stuck page's CDP endpoint and send Runtime.terminateExecution. This kills running JS without navigating away or crashing the page. Also clear connecting state and remove stale disconnect listeners. * fix(browser): abort cancels stuck evaluate * Browser: always cleanup evaluate abort listener * Chore: remove Playwright debug scripts * Docs: add CDP evaluate refactor plan * Browser: refactor Playwright force-disconnect * Browser: abort stops evaluate promptly * Node host: extract withTimeout helper * Browser: remove disconnected listener safely * Changelog: note act:evaluate hang fix --------- Co-authored-by: Bob <bob@dutifulbob.com>
9.9 KiB
summary, owner, status, last_updated, title
| summary | owner | status | last_updated | title |
|---|---|---|---|---|
| Plan: isolate browser act:evaluate from Playwright queue using CDP, with end-to-end deadlines and safer ref resolution | openclaw | draft | 2026-02-10 | Browser Evaluate CDP Refactor |
Browser Evaluate CDP Refactor Plan
Context
act:evaluate executes user provided JavaScript in the page. Today it runs via Playwright
(page.evaluate or locator.evaluate). Playwright serializes CDP commands per page, so a
stuck or long running evaluate can block the page command queue and make every later action
on that tab look "stuck".
PR #13498 adds a pragmatic safety net (bounded evaluate, abort propagation, and best-effort
recovery). This document describes a larger refactor that makes act:evaluate inherently
isolated from Playwright so a stuck evaluate cannot wedge normal Playwright operations.
Goals
act:evaluatecannot permanently block later browser actions on the same tab.- Timeouts are single source of truth end to end so a caller can rely on a budget.
- Abort and timeout are treated the same way across HTTP and in-process dispatch.
- Element targeting for evaluate is supported without switching everything off Playwright.
- Maintain backward compatibility for existing callers and payloads.
Non-goals
- Replace all browser actions (click, type, wait, etc.) with CDP implementations.
- Remove the existing safety net introduced in PR #13498 (it remains a useful fallback).
- Introduce new unsafe capabilities beyond the existing
browser.evaluateEnabledgate. - Add process isolation (worker process/thread) for evaluate. If we still see hard to recover stuck states after this refactor, that is a follow-up idea.
Current Architecture (Why It Gets Stuck)
At a high level:
- Callers send
act:evaluateto the browser control service. - The route handler calls into Playwright to execute the JavaScript.
- Playwright serializes page commands, so an evaluate that never finishes blocks the queue.
- A stuck queue means later click/type/wait operations on the tab can appear to hang.
Proposed Architecture
1. Deadline Propagation
Introduce a single budget concept and derive everything from it:
- Caller sets
timeoutMs(or a deadline in the future). - The outer request timeout, route handler logic, and the execution budget inside the page all use the same budget, with small headroom where needed for serialization overhead.
- Abort is propagated as an
AbortSignaleverywhere so cancellation is consistent.
Implementation direction:
- Add a small helper (for example
createBudget({ timeoutMs, signal })) that returns:signal: the linked AbortSignaldeadlineAtMs: absolute deadlineremainingMs(): remaining budget for child operations
- Use this helper in:
src/browser/client-fetch.ts(HTTP and in-process dispatch)src/node-host/runner.ts(proxy path)- browser action implementations (Playwright and CDP)
2. Separate Evaluate Engine (CDP Path)
Add a CDP based evaluate implementation that does not share Playwright's per page command queue. The key property is that the evaluate transport is a separate WebSocket connection and a separate CDP session attached to the target.
Implementation direction:
- New module, for example
src/browser/cdp-evaluate.ts, that:- Connects to the configured CDP endpoint (browser level socket).
- Uses
Target.attachToTarget({ targetId, flatten: true })to get asessionId. - Runs either:
Runtime.evaluatefor page level evaluate, orDOM.resolveNodeplusRuntime.callFunctionOnfor element evaluate.
- On timeout or abort:
- Sends
Runtime.terminateExecutionbest-effort for the session. - Closes the WebSocket and returns a clear error.
- Sends
Notes:
- This still executes JavaScript in the page, so termination can have side effects. The win is that it does not wedge the Playwright queue, and it is cancelable at the transport layer by killing the CDP session.
3. Ref Story (Element Targeting Without A Full Rewrite)
The hard part is element targeting. CDP needs a DOM handle or backendDOMNodeId, while
today most browser actions use Playwright locators based on refs from snapshots.
Recommended approach: keep existing refs, but attach an optional CDP resolvable id.
3.1 Extend Stored Ref Info
Extend the stored role ref metadata to optionally include a CDP id:
- Today:
{ role, name, nth } - Proposed:
{ role, name, nth, backendDOMNodeId?: number }
This keeps all existing Playwright based actions working and allows CDP evaluate to accept
the same ref value when the backendDOMNodeId is available.
3.2 Populate backendDOMNodeId At Snapshot Time
When producing a role snapshot:
- Generate the existing role ref map as today (role, name, nth).
- Fetch the AX tree via CDP (
Accessibility.getFullAXTree) and compute a parallel map of(role, name, nth) -> backendDOMNodeIdusing the same duplicate handling rules. - Merge the id back into the stored ref info for the current tab.
If mapping fails for a ref, leave backendDOMNodeId undefined. This makes the feature
best-effort and safe to roll out.
3.3 Evaluate Behavior With Ref
In act:evaluate:
- If
refis present and hasbackendDOMNodeId, run element evaluate via CDP. - If
refis present but has nobackendDOMNodeId, fall back to the Playwright path (with the safety net).
Optional escape hatch:
- Extend the request shape to accept
backendDOMNodeIddirectly for advanced callers (and for debugging), while keepingrefas the primary interface.
4. Keep A Last Resort Recovery Path
Even with CDP evaluate, there are other ways to wedge a tab or a connection. Keep the existing recovery mechanisms (terminate execution + disconnect Playwright) as a last resort for:
- legacy callers
- environments where CDP attach is blocked
- unexpected Playwright edge cases
Implementation Plan (Single Iteration)
Deliverables
- A CDP based evaluate engine that runs outside the Playwright per-page command queue.
- A single end-to-end timeout/abort budget used consistently by callers and handlers.
- Ref metadata that can optionally carry
backendDOMNodeIdfor element evaluate. act:evaluateprefers the CDP engine when possible and falls back to Playwright when not.- Tests that prove a stuck evaluate does not wedge later actions.
- Logs/metrics that make failures and fallbacks visible.
Implementation Checklist
- Add a shared "budget" helper to link
timeoutMs+ upstreamAbortSignalinto:- a single
AbortSignal - an absolute deadline
- a
remainingMs()helper for downstream operations
- a single
- Update all caller paths to use that helper so
timeoutMsmeans the same thing everywhere:src/browser/client-fetch.ts(HTTP and in-process dispatch)src/node-host/runner.ts(node proxy path)- CLI wrappers that call
/act(add--timeout-mstobrowser evaluate)
- Implement
src/browser/cdp-evaluate.ts:- connect to the browser-level CDP socket
Target.attachToTargetto get asessionId- run
Runtime.evaluatefor page evaluate - run
DOM.resolveNode+Runtime.callFunctionOnfor element evaluate - on timeout/abort: best-effort
Runtime.terminateExecutionthen close the socket
- Extend stored role ref metadata to optionally include
backendDOMNodeId:- keep existing
{ role, name, nth }behavior for Playwright actions - add
backendDOMNodeId?: numberfor CDP element targeting
- keep existing
- Populate
backendDOMNodeIdduring snapshot creation (best-effort):- fetch AX tree via CDP (
Accessibility.getFullAXTree) - compute
(role, name, nth) -> backendDOMNodeIdand merge into the stored ref map - if mapping is ambiguous or missing, leave the id undefined
- fetch AX tree via CDP (
- Update
act:evaluaterouting:- if no
ref: always use CDP evaluate - if
refresolves to abackendDOMNodeId: use CDP element evaluate - otherwise: fall back to Playwright evaluate (still bounded and abortable)
- if no
- Keep the existing "last resort" recovery path as a fallback, not the default path.
- Add tests:
- stuck evaluate times out within budget and the next click/type succeeds
- abort cancels evaluate (client disconnect or timeout) and unblocks subsequent actions
- mapping failures cleanly fall back to Playwright
- Add observability:
- evaluate duration and timeout counters
- terminateExecution usage
- fallback rate (CDP -> Playwright) and reasons
Acceptance Criteria
- A deliberately hung
act:evaluatereturns within the caller budget and does not wedge the tab for later actions. timeoutMsbehaves consistently across CLI, agent tool, node proxy, and in-process calls.- If
refcan be mapped tobackendDOMNodeId, element evaluate uses CDP; otherwise the fallback path is still bounded and recoverable.
Testing Plan
- Unit tests:
(role, name, nth)matching logic between role refs and AX tree nodes.- Budget helper behavior (headroom, remaining time math).
- Integration tests:
- CDP evaluate timeout returns within budget and does not block the next action.
- Abort cancels evaluate and triggers termination best-effort.
- Contract tests:
- Ensure
BrowserActRequestandBrowserActResponseremain compatible.
- Ensure
Risks And Mitigations
- Mapping is imperfect:
- Mitigation: best-effort mapping, fallback to Playwright evaluate, and add debug tooling.
Runtime.terminateExecutionhas side effects:- Mitigation: only use on timeout/abort and document the behavior in errors.
- Extra overhead:
- Mitigation: only fetch AX tree when snapshots are requested, cache per target, and keep CDP session short lived.
- Extension relay limitations:
- Mitigation: use browser level attach APIs when per page sockets are not available, and keep the current Playwright path as fallback.
Open Questions
- Should the new engine be configurable as
playwright,cdp, orauto? - Do we want to expose a new "nodeRef" format for advanced users, or keep
refonly? - How should frame snapshots and selector scoped snapshots participate in AX mapping?