Skip to main content

Tool-scoped Memory

The tool-scoped memory layer captures actionable guidance about how the agent uses specific tools — distinct from the memory retrieval of general Memory Tools, and also distinct from the tool_effectiveness statistics namespace. It's the surface that turns "never email Sarah" into a hard constraint the agent must follow on every subsequent turn.

What's Stored

Each tool has its own namespace tool-{tool_name}, distinct from global, skill-{id}, and the statistics-only tool_effectiveness namespace. Each entry in it is a ToolMemoryRule:

FieldPurpose
idStable per-rule UUID. Upserts replay the same id.
tool_nameTool the rule applies to (e.g., send_email, shell)
ruleNatural language guidance the agent must follow
prioritycritical, high, or normal. Drives retrieval + compression.
sourceuser_explicit, post_turn, or programmatic — origin
tagsFree-form tags (safety, permission, etc.)
created_at / updated_atRFC3339 timestamps

Statistics (tool_effectiveness/tool/{name}) and rules (tool-{name}/rule/{id}) by design live in different namespaces — one tracks "what happened", the other tracks "what to do".

Priority Levels

PriorityStorage LocationCompression-resistant?
criticalPinned to system prompt via ToolMemoryRulesSectionYes — system prompt is frozen during session, never rewritten by mid-session compressor
highSame system prompt block, after criticalYes — same mechanism
normalStored in namespace; retrieved on-demand via memory_recallNo — subject to compression like any other namespace memory

The compression-resistance property is structural: critical and high rules ride on the system prompt, whose prefix cache keeps the entire thing frozen for the session. There's no way for token compression to silently delete a critical rule.

Capture Pipeline

Two automatic capture paths trigger after each turn (via ToolMemoryCaptureHook):

  1. User decrees — Sentences like never <verb> <noun>, don't <verb> ..., do not <verb> ..., or stop <verb>ing ... in user messages are promoted to Critical rules on matching tools
  2. Repeated tool failures — Tools that fail two or more times in a single turn get a Normal priority observation, summarized inline so the agent has context when considering that tool next time

Retrieval at Tool Selection

At session start, the harness prefetches every Critical and High rule via ToolMemoryStore::rules_for_prompt, renders them into a ## Tool-scoped rules block, and pins it to the system prompt. Because the prompt is frozen for the session lifetime, rules are visible at every tool selection and before any actual tool execution.

Lower-priority guidance stays outside the prompt budget; the agent fetches it on-demand by calling memory_recall against the tool-{name} namespace.

RPC Interface

Six methods are exposed under the memory namespace:

MethodPurpose
memory.tool_rule_putUpsert a rule. Use priority='critical' for safety-critical entries
memory.tool_rule_getGet a rule by (tool_name, id)
memory.tool_rule_listList all rules for a tool, sorted by priority + freshness
memory.tool_rule_deleteDelete a rule
memory.tool_rules_for_promptReturns rendered Markdown block + structured snapshot
memory.tool_rules_jsonRaw JSON list

End-to-end Safety Case

The "never email Sarah" path is covered by regression tests:

  1. User says "never email Sarah at sarah@example.com" in a turn that invokes send_email
  2. ToolMemoryCaptureHook extracts the decree, maps email alias to send_email tool, and writes a Critical rule under tool-send_email/rule/{uuid}
  3. In the next session, prefetch_tool_memory_rules_blocking pulls every Critical and High rule, session builder appends ToolMemoryRulesSection to system prompt
  4. Agent sees ### \send_email`followed by- [critical] never email Sarah at sarah@example.com.` before tool selection, and this rule survives any mid-session token compression

Next Steps