Y
yelo10:25The user wants me to use the deepwiki skill to analyze this GitHub project. Let me read the deepwiki skill file first.
File: ~/projects/public-skills/skills/deepwiki/SKILL.md
---
name: deepwiki
description: "AI-powered local codebase documentation generator. Use for: (1) Deep code analysis with business logic extraction, (2) Comprehensive API documentation with signatures, (3) Architecture diagrams and data flow mapping, (4) Code patterns identification and design decisions. Not for: Simple file listings or code comments only."
metadata: {"openclaw": {"emoji": "📚"}}
---
# DeepWiki — Professional Codebase Documentation Generator
Generate comprehensive, professional documentation that explains **why** and **how**, not just **what**.
## What DeepWiki Generates
DeepWiki goes beyond file listings to create documentation that tells the story of your codebase:
### 📖 Documentation Types
- **Architecture Docs** — System design, components, data flow
- **API References** — Complete function signatures with examples
- **Module Guides** — How to use and extend each part
- **Data Schemas** — Type definitions and validation rules
- **Cookbooks** — Common tasks and solutions
### 🎯 Deep Analysis Features
- **Business Logic Extraction** — Understand what the code actually does
- **Pattern Recognition** — Identify architectural and design patterns
- **Data Flow Mapping** — Trace how data moves through the system
- **Decision Records** — Document why architectural choices were made
- **Code Examples** — Working examples for every API
## Claude Code Usage
Simply describe what documentation you need:
```bash
"Analyze this codebase and generate architecture documentation"
"Document all APIs with TypeScript signatures"
"Create a module guide for the auth system"
"Generate a data schema reference"
"Scan the code and create a cookbook for common tasks"
```
DeepWiki will read your actual code, not just filenames, to create comprehensive documentation.
## When to Use
- User wants to document a new codebase
- Need to understand legacy or complex code
- Onboarding new team members
- Preparing context for AI tools
- Creating public API documentation
## Prerequisites
- Claude Code or OpenClaw with read access to the codebase
- (Optional) `tree` command for directory structure visualization
## OpenClaw Usage
### Commands
```
/deepwiki-scan [project-path] # Deep code analysis
/deepwiki-generate # Full documentation
/deepwiki-api # API reference only
/deepwiki-architecture # Architecture docs only
/deepwiki-modules # Module guides only
/deepwiki-data # Data schemas only
```
## Claude Code Usage
Simply describe what you need (no commands needed):
```bash
"Scan this codebase structure and extract business logic"
"Generate complete documentation with API reference"
"Document the architecture and design patterns"
"Create module guides with usage examples"
"Analyze data types and validation rules"
```
The skill triggers on keywords from the description above.
## For Detailed Guidance
See [SCAN_GUIDE.md](references/SCAN_GUIDE.md) for deep analysis methodology.
See [GENERATE_GUIDE.md](references/GENERATE_GUIDE.md) for documentation templates.
## What Makes DeepWiki Different
| Aspect | Basic Docs | DeepWiki |
|--------|-----------|----------|
| Code understanding | "Has auth module" | "Auth flow: login → validate → token → session" |
| API docs | "Has login function" | Full signature + params + errors + examples |
| Architecture | "Uses React" | Component hierarchy + data flow + state management |
| Patterns | None identified | "Uses Compound Component pattern in BentoGrid" |
| Decisions | Not documented | "Chose this approach because..." |
## Output Structure
```
docs/deepwiki/
├── README.md # Project overview + quick start
├── ARCHITECTURE.md # System design + components
├── API.md # Complete API reference
├── MODULES.md # Module guides + recipes
├── DATA.md # Type schemas + validation
└── reference/
├── ADR.md # Architecture decisions
└── glossary.md # Term definitions
```
## Best For
- 📚 Onboarding new team members
- 🔄 Understanding legacy codebases
- 📝 Preparing for AI context injection
- 🎓 Teaching others about your codebase
- 📦 Creating user-facing documentation
## Tips
1. **Be specific** — "Document the payment flow" vs "Document everything"
2. **Iterate** — Generate, review, then ask for refinements
3. **Focus** — Use `--focus=api` for API-only docs
4. **Examples matter** — DeepWiki includes runnable examples
## Integration
DeepWiki works great with:
- **Claude Code** — Context injection and code understanding
- **GitHub** — Documentation on push to main/PR
- **Onboarding** — New team member documentation
- **API Docs** — Public-facing API reference
This is a massive codebase. Let me look at the docs folder to see what analysis already exists, and then look at a few more critical files to understand the architecture deeply.
# 隐藏功能与模型代号
> 基于 Claude Code v2.1.88 反编译源码分析
## 模型代号体系
Anthropic 使用**动物名称**作为内部模型代号。这些代号被严格保护,防止泄露到外部构建中。
### 已知代号
| 代号 | 角色 | 证据 |
|------|------|------|
| **Tengu**(天狗) | 产品/遥测前缀,也可能是模型 | 所有 250+ 分析事件和 feature flag 使用 `tengu_*` 前缀 |
| **Capybara**(水豚) | Sonnet 系列模型,当前版本 v8 | `capybara-v2-fast[1m]`,v8 行为问题的 prompt 补丁 |
| **Fennec**(耳廓狐) | Opus 4.6 的前代 | 迁移: `fennec-latest` → `opus` |
| **Numbat**(袋食蚁兽) | 下一代模型 | 注释: "Remove this section when we launch numbat" |
### 代号保护机制
Undercover 模式明确列出了受保护的代号:
```typescript
// src/utils/undercover.ts:48-49
NEVER include in commit messages or PR descriptions:
- Internal model codenames (animal names like Capybara, Tengu, etc.)
- Unreleased model version numbers (e.g., opus-4-7, sonnet-4-8)
```
构建系统使用 `scripts/excluded-strings.txt` 扫描泄露的代号。Buddy 系统的物种通过 `String.fromCharCode()` 编码以避免触发金丝雀检查:
```typescript
// src/buddy/types.ts:10-13
// One species name collides with a model-codename canary in excluded-strings.txt.
// 运行时构造值,保持字面量不出现在构建产物中
```
那个冲突的物种就是 **capybara** — 既是宠物物种又是模型代号。
### Capybara v8 的行为问题
源码揭示了 Capybara v8 的具体行为问题:
1. **停止序列误触发** (~10% 概率) — prompt 尾部出现 `<functions>` 时
2. **空 tool_result 导致零输出** — 需要注入 marker workaround
3. **过度写注释** — 需要专门的反注释 prompt 补丁
4. **高虚假声明率**: v8 为 29-30%,而 v4 为 16.7%
5. **验证不足** — 需要 "thoroughness counterweight" 补丁
## Feature Flag 命名约定
所有 feature flag 使用 `tengu_` 前缀 + **随机词对**以掩盖用途:
| Flag | 用途 |
|------|------|
| `tengu_onyx_plover` | Auto Dream(后台记忆整理)|
| `tengu_coral_fern` | memdir 功能 |
| `tengu_herring_clock` | 团队内存 |
| `tengu_frond_boric` | 分析 kill switch |
| `tengu_amber_quartz_disabled` | 语音模式 kill switch |
| `tengu_amber_flint` | 代理团队 |
## 内外部用户的差异
Anthropic 员工 (`USER_TYPE === 'ant'`) 获得显著更好的待遇:
| 维度 | 外部用户 | 内部用户 (ant) |
|------|---------|--------------|
| 输出风格 | "尽量简洁" | "倾向于更多解释" |
| 虚假声明缓解 | 无 | 专门的 Capybara v8 补丁 |
| 数值长度锚定 | 无 | "工具间 ≤25 词,最终回复 ≤100 词" |
| 验证代理 | 无 | 非简单改动必须启用 |
| 主动性 | 无 | "发现用户误解要指出" |
## 隐藏命令
| 命令 | 状态 | 描述 |
|------|------|------|
| `/btw` | 活跃 | 顺带提问,不打断主对话 |
| `/stickers` | 活跃 | 订购 Claude Code 贴纸 |
| `/thinkback` | 活跃 | 2025 年度回顾 |
| `/good-claude` | 占位 | 隐藏的 stub 命令 |
| `/bughunter` | 占位 | 隐藏的 stub 命令 |
I've completed a deepwiki-style analysis of the Claude Code source code and saved it as a note. Let me provide a summary to the user.
Yelo wants to see the Deepwiki analysis results. I already created a comprehensive note at ~/notes/claude-code-source-code-analysis.md. Let me read it and present it to Yelo.
File: ~/notes/claude-code-source-code-analysis.md
# Claude Code v2.1.88 源码深度分析
**来源**: https://github.com/sanbuphy/claude-code-source-code
**日期**: 2026-04-01
**标签**: AI, Claude, Agent, Code Analysis
---
## 📋 项目概述
对 Claude Code v2.1.88 npm 包(`@anthropic-ai/claude-code`)的源码进行了解析和逆向工程。由 `sanbuphy` 维护,18k+ stars。
### 免责声明
> All source code in this repository is the intellectual property of Anthropic and Claude. This repository is provided strictly for technical research, study, and educational exchange among enthusiasts. **Commercial use is strictly prohibited.**
### 统计数据
| 指标 | 数值 |
|------|------|
| 源码文件 (.ts/.tsx) | ~1,884 |
| 代码行数 | ~512,664 |
| 最大单文件 | `main.tsx` (4,683 行) |
| 内置工具 | 40+ |
| Slash 命令 | 80+ |
| 运行时 | Bun (编译为 Node.js >= 18) |
---
## 🏗️ 架构总览
### 核心 Loop
```
User → messages[] → Claude API → response
↓
stop_reason == "tool_use" ?
/ \
yes no
↓ ↓
execute tools return text
append result
loop back
```
围绕这个最小 loop,Claude Code 叠加了 12 层生产级机制。
### 目录结构
```
src/
├── main.tsx # REPL bootstrap (4,683 行)
├── QueryEngine.ts # SDK/headless query lifecycle
├── query.ts # Main agent loop (1,729 行)
├── Tool.ts # Tool interface + buildTool factory
├── tools.ts # Tool registry
├── commands.ts # Slash command definitions
├── context.ts # User input context
├── cost-tracker.ts # API cost accumulation
├── setup.ts # First-run setup flow
│
├── bridge/ # Claude Desktop / remote bridge
├── cli/ # CLI infrastructure
├── commands/ # ~80 slash commands
├── components/ # React/Ink terminal UI
├── entrypoints/ # Application entry points
├── hooks/ # React hooks
├── services/ # Business logic layer
├── state/ # Application state
├── tasks/ # Task implementations
├── tools/ # 40+ tool implementations
├── types/ # Type definitions
└── utils/ # Utilities (largest directory)
```
---
## 🔧 工具系统架构
### 完整工具清单
| 类别 | 工具 |
|------|------|
| **文件操作** | FileReadTool, FileEditTool, FileWriteTool, NotebookEditTool, GlobTool, GrepTool |
| **搜索发现** | ToolSearchTool, WebSearchTool, WebFetchTool |
| **执行** | BashTool, PowerShellTool |
| **Agent/任务** | AgentTool, TaskCreateTool, TaskGetTool, TaskUpdateTool, TaskListTool, TaskStopTool |
| **交互** | AskUserQuestionTool, BriefTool |
| **规划** | EnterPlanModeTool, ExitPlanModeTool, EnterWorktreeTool, ExitWorktreeTool, TodoWriteTool |
| **系统** | ConfigTool, SkillTool, MCPTool, LSPTool |
| **通讯** | SendMessageTool, TeamCreateTool, TeamDeleteTool |
### Feature-Gated 工具
这些工具在发布版中被 dead-code-elimination:
| 工具 | Feature Gate | 状态 |
|------|-------------|------|
| REPLTool | `ant` (internal) | 编译时删除 |
| SnipTool | `HISTORY_SNIP` | 编译时删除 |
| SleepTool | `PROACTIVE` / `KAIROS` | 编译时删除 |
| MonitorTool | `MONITOR_TOOL` | 编译时删除 |
| WorkflowTool | `WORKFLOW_SCRIPTS` | 编译时删除 |
| WebBrowserTool | `WEB_BROWSER_TOOL` | 编译时删除 |
| TerminalCaptureTool | `TERMINAL_PANEL` | 编译时删除 |
| CtxInspectTool | `CONTEXT_COLLAPSE` | 编译时删除 |
| ListPeersTool | `UDS_INBOX` | 编译时删除 |
| DiscoverSkillsTool | `EXPERIMENTAL_SKILL_SEARCH` | 编译时删除 |
---
## 🔄 Query Engine 生命周期
### 数据流
```
USER INPUT (prompt / slash command)
│
▼
processUserInput() ← parse /commands, build UserMessage
│
▼
fetchSystemPromptParts() ← tools → prompt sections, CLAUDE.md memory
│
▼
recordTranscript() ← persist user message to disk (JSONL)
│
▼
┌─→ normalizeMessagesForAPI() ← strip UI-only fields, compact if needed
│ │
│ ▼
│ Claude API (streaming) ← POST /v1/messages with tools + system prompt
│ │
│ ▼
│ stream events ← message_start → content_block_delta → message_stop
│ │
│ ├─ text block ──────────────→ yield to consumer (SDK / REPL)
│ │
│ └─ tool_use block?
│ │
│ ▼
│ StreamingToolExecutor ← partition: concurrent-safe vs serial
│ │
│ ▼
│ canUseTool() ← permission check (hooks + rules + UI prompt)
│ │
│ ├─ DENY ────────────────→ append tool_result(error), continue loop
│ │
│ └─ ALLOW
│ │
│ ▼
│ tool.call() ← execute the tool
│ │
│ ▼
│ append tool_result ← push to messages[], recordTranscript()
│ │
└─────────┘ ← loop back to API call
│
▼ (stop_reason != "tool_use")
yield result message ← final text, usage, cost, session_id
```
### 并发控制
`StreamingToolExecutor` 实现了智能并发:
- **并发安全工具**:可并行执行
- **非并发工具**:独占执行(等待其他工具完成)
- 工具按接收顺序执行,但结果缓冲后按序输出
---
## 📊 上下文压缩系统
### 三层压缩策略
| 策略 | 触发条件 | 机制 |
|------|---------|------|
| `autoCompact` | token 超过阈值 | 调用 compact API 总结旧消息 |
| `snipCompact` | `HISTORY_SNIP` flag | 移除僵尸消息和过时 marker |
| `contextCollapse` | `CONTEXT_COLLAPSE` flag | 重构上下文结构 |
### 压缩边界标记
```json
{"type":"system","subtype":"compact_boundary"}
```
---
## 🐍 子 Agent 与多 Agent 架构
### Spawn 模式
| 模式 | 特性 |
|------|------|
| `default` | 进程内,共享对话 |
| `fork` | 子进程,fresh messages[],共享文件缓存 |
| `worktree` | 隔离 git worktree + fork |
| `remote` | Bridge 到 Claude Code Remote / container |
### 团队协作
```
Lead Agent
├── Teammate A ──> claims Task 1
├── Teammate B ──> claims Task 2
└── Teammate C ──> claims Task 3
共享: task board, message inbox
隔离: messages[], file cache, cwd
```
---
## 🔐 Feature Flag 系统
### Dead Code Elimination 机制
```typescript
feature('FLAG_NAME') ──→ true → included in bundle
──→ false → stripped from bundle
```
### 已知 Flags
| Flag | 用途 |
|------|------|
| `COORDINATOR_MODE` | 多 agent 协调器 |
| `HISTORY_SNIP` | 激进历史修剪 |
| `CONTEXT_COLLAPSE` | 上下文重构 |
| `DAEMON` | 后台 daemon workers |
| `KAIROS` | 推送通知、文件发送 |
| `PROACTIVE` | 主动行为、睡眠工具 |
| `VOICE_MODE` | 语音输入/输出 |
| `WORKFLOW_SCRIPTS` | 工作流工具 |
| `WEB_BROWSER_TOOL` | 浏览器自动化 |
| `MONITOR_TOOL` | MCP 监控 |
| `TERMINAL_PANEL` | 终端捕获 |
| `UDS_INBOX` | Peer 发现 |
| `EXPERIMENTAL_SKILL_SEARCH` | Skill 发现 |
---
## 🕵️ 隐藏功能与代号
### 模型代号体系
| 代号 | 角色 | 证据 |
|------|------|------|
| **Tengu** | 产品/遥测前缀 | 250+ 分析事件和 feature flag |
| **Capybara** | Sonnet 系列 v8 | `capybara-v2-fast[1m]` |
| **Fennec** | Opus 4.6 前代 | 迁移: `fennec-latest` → `opus` |
| **Numbat** | 下一代模型 | 注释: "Remove when we launch numbat" |
### Feature Flag 命名
使用 `tengu_` + 随机词对掩盖用途:
```
tengu_onyx_plover → Auto Dream
tengu_coral_fern → memdir 功能
tengu_herring_clock → 团队内存
tengu_frond_boric → 分析 kill switch
tengu_amber_quartz_disabled → 语音模式 kill switch
```
### Capybara v8 行为问题
1. 停止序列误触发 (~10%)
2. 空 tool_result 导致零输出
3. 过度写注释
4. 高虚假声明率 (29-30% vs v4 的 16.7%)
5. 验证不足
---
## 🎭 Undercover 模式
Anthropic 员工在公开仓库自动进入 undercover 模式:
```typescript
// 指示: "Do not blow your cover"
// 行为: strip 所有 AI 署名,按人类开发者风格写 commit
```
**无法强制关闭**。
---
## ⏱️ 远程控制
- 每小时轮询 `/api/claude_code/settings`
- 危险变更弹窗阻止(点拒绝 = 退出 app)
- 6+ killswitches
- GrowthBook flags 可在用户不知情下改变行为
---
## 🚀 未来路线图
| 代号 | 描述 |
|------|------|
| **Numbat** | 下一代模型 |
| **KAIROS** | 全自主 agent 模式(`<tick>` 心跳、PR 订阅、推送通知) |
| **Voice Mode** | Push-to-talk(已就绪但 gated) |
| 17 个未发布工具 | 等待上线 |
---
## ⚠️ 108 个缺失模块
这些模块在编译时被 dead-code-elimination,永远无法恢复:
### Anthropic 内部模块 (~70)
- `daemon/main.js`, `daemon/workerRegistry.js`
- `proactive/index.js`, `contextCollapse/index.js`
- `coordinator/workerAgent.js`, `assistant/index.js`
- 等等...
### Feature-Gated 工具 (~20)
- REPLTool, SnipTool, SleepTool, MonitorTool
- WorkflowTool, WebBrowserTool, TerminalCaptureTool
- 等等...
---
## 🎨 设计模式
| 模式 | 位置 | 用途 |
|------|------|------|
| **AsyncGenerator streaming** | `QueryEngine`, `query()` | API 到 consumer 的全链路流式 |
| **Builder + Factory** | `buildTool()` | 安全默认的工具定义 |
| **Branded Types** | `SystemPrompt` | 防止 string/array 混淆 |
| **Feature Flags + DCE** | `feature()` from `bun:bundle` | 编译时 dead code 消除 |
| **Discriminated Unions** | `Message` types | 类型安全的消息处理 |
| **Observer + State Machine** | `StreamingToolExecutor` | 工具执行生命周期跟踪 |
| **Snapshot State** | `FileHistoryState` | 文件操作 undo/redo |
| **Fire-and-Forget Write** | `recordTranscript()` | 非阻塞持久化 |
| **Lazy Schema** | `lazySchema()` | 延迟 Zod schema 计算 |
| **Context Isolation** | `AsyncLocalStorage` | 共享进程中的 per-agent 上下文 |
---
## 📁 关键文件解析
### query.ts — 主循环 (1,729 行)
核心逻辑:
- 无限循环 `while(true)` 调用 Claude API
- 检查 `stop_reason`
- 执行工具并追加结果
- 支持 `maxTurns` 限制
- 包含 `TOKEN_BUDGET` 特性的实验性任务预算
### QueryEngine.ts — 查询引擎 (1,295 行)
生命周期管理:
- `submitMessage()` → `AsyncGenerator<SDKMessage>`
- `fetchSystemPromptParts()` → 组装系统 prompt
- `processUserInput()` → 处理 /commands
- `query()` → 主 agent 循环
### Tool.ts — 工具接口 (792 行)
```typescript
buildTool(definition) ──> Tool<Input, Output, Progress>
生命周期:
├── validateInput() → early reject
├── checkPermissions() → tool-specific authz
└── call() → execute
渲染 (React/Ink):
├── renderToolUseMessage()
├── renderToolResultMessage()
├── renderToolUseProgressMessage()
└── renderGroupedToolUse()
AI 面向:
├── prompt() → tool description for LLM
├── description() → dynamic description
└── mapToolResultToAPI() → format for API response
```
### main.tsx — REPL 引导 (4,683 行)
最大的单文件,包含:
- REPL 初始化
- 命令处理
- UI 渲染循环
- 会话管理
---
## 🔌 MCP 集成
```
MCPConnectionManager.tsx
├── Server Discovery
│ ├── stdio → spawn child process
│ ├── sse → HTTP EventSource
│ ├── http → Streamable HTTP
│ ├── ws → WebSocket
│ └── sdk → in-process transport
├── Authentication
│ ├── OAuth 2.0
│ ├── Cross-App Access (XAA / SEP-990)
│ └── API key via headers
└── Tool Registration
├── mcp__<server>__<tool> naming
├── Dynamic schema
└── Permission passthrough
```
---
## 📈 12 层渐进式 Harness 机制
```
s01 THE LOOP "One loop & Bash is all you need"
s02 TOOL DISPATCH "Adding a tool = adding one handler"
s03 PLANNING "An agent without a plan drifts"
s04 SUB-AGENTS "Break big tasks; clean context per subtask"
s05 KNOWLEDGE ON DEMAND "Load knowledge when you need it"
s06 CONTEXT COMPRESSION "Context fills up; make room"
s07 PERSISTENT TASKS "Big goals → small tasks → disk"
s08 BACKGROUND TASKS "Slow ops in background; agent keeps thinking"
s09 AGENT TEAMS "Too big for one → delegate to teammates"
s10 TEAM PROTOCOLS "Shared communication rules"
s11 AUTONOMOUS AGENTS "Teammates scan and claim tasks themselves"
s12 WORKTREE ISOLATION "Each works in its own directory"
```
---
## 💾 会话持久化
```
~/.claude/projects/<hash>/sessions/
└── <session-id>.jsonl ← append-only log
├── {"type":"user",...}
├── {"type":"assistant",...}
├── {"type":"progress",...}
└── {"type":"system","subtype":"compact_boundary",...}
RESUME FLOW:
getLastSessionLog() ──> parse JSONL ──> rebuild messages[]
│
├── --continue → last session in cwd
├── --resume <id> → specific session
└── --fork-session → new ID, copy history
```
---
## 🔑 核心洞察
1. **编译时 vs 运行时**:大量逻辑在编译时决定(`feature()`),运行时无法改变
2. **双重遥测管道**:1P → Anthropic + Datadog,无 UI 暴露的退出选项
3. **员工特权**:`USER_TYPE === 'ant'` 的代码路径显著更受照顾
4. **危险的远程控制**:GrowthBook flags 可在用户不知情下改变行为
5. **不可逆的 DCE**:108 个模块永远丢失,无法恢复
Good, I have the note. Let me present it to Yelo in a concise way since the full content is quite long.
Yelo wants me to explain each point in detail. Let me go through the key findings one by one with more depth.
Yelo wants me to dig deeper into:
- Permission System
- MCP Integration
- Sub-agent Mechanism
Let me read the relevant source files.
Yelo wants to explore the Coordinator design in more depth. Let me read the coordinator-related source files.
\`\`\`
\`\`\`
// Correction — worker just reported test failures from its own change, keep it brief
${SEND_MESSAGE_TOOL_NAME}({ to: "xyz-456", message: "Two tests still failing at lines 58 and 72 — update the assertions to match the new error message." })
\`\`\`
### Prompt tips
**Good examples:**
1. Implementation: "Fix the null pointer in src/auth/validate.ts:42. The user field can be undefined when the session expires. Add a null check and return early with an appropriate error. Commit and report the hash."
2. Precise git operation: "Create a new branch from main called 'fix/session-expiry'. Cherry-pick only commit abc123 onto it. Push and create a draft PR targeting main. Add anthropics/claude-code as reviewer. Report the PR URL."
3. Correction (continued worker, short): "The tests failed on the null check you added — validate.test.ts:58 expects 'Invalid session' but you changed it to 'Session expired'. Fix the assertion. Commit and report the hash."
**Bad examples:**
1. "Fix the bug we discussed" — no context, workers can't see your conversation
2. "Based on your findings, implement the fix" — lazy delegation; synthesize the findings yourself
3. "Create a PR for the recent changes" — ambiguous scope: which changes? which branch? draft?
4. "Something went wrong with the tests, can you look?" — no error message, no file path, no direction
Additional tips:
- Include file paths, line numbers, error messages — workers start fresh and need complete context
- State what "done" looks like
- For implementation: "Run relevant tests and typecheck, then commit your changes and report the hash" — workers self-verify before reporting done. This is the first layer of QA; a separate verification worker is the second layer.
- For research: "Report findings — do not modify files"
- Be precise about git operations — specify branch names, commit hashes, draft vs ready, reviewers
- When continuing for corrections: reference what the worker did ("the null check you added") not what you discussed with the user
- For implementation: "Fix the root cause, not the symptom" — guide workers toward durable fixes
- For verification: "Prove the code works, don't just confirm it exists"
- For verification: "Try edge cases and error paths — don't just re-run what the implementation worker ran"
- For verification: "Investigate failures — don't dismiss as unrelated without evidence"
## 6. Example Session
User: "There's a null pointer in the auth module. Can you fix it?"
You:
Let me investigate first.
${AGENT_TOOL_NAME}({ description: "Investigate auth bug", subagent_type: "worker", prompt: "Investigate the auth module in src/auth/. Find where null pointer exceptions could occur around session handling and token validation... Report specific file paths, line numbers, and types involved. Do not modify files." })
${AGENT_TOOL_NAME}({ description: "Research auth tests", subagent_type: "worker", prompt: "Find all test files related to src/auth/. Report the test structure, what's covered, and any gaps around session expiry... Do not modify files." })
Investigating from two angles — I'll report back with findings.
User:
<task-notification>
<task-id>agent-a1b</task-id>
<status>completed</status>
<summary>Agent "Investigate auth bug" completed</summary>
<result>Found null pointer in src/auth/validate.ts:42. The user field on Session is undefined when the session expires but ...</result>
</task-notification>
You:
Found the bug — null pointer in validate.ts:42.
${SEND_MESSAGE_TOOL_NAME}({ to: "agent-a1b", message: "Fix the null pointer in src/auth/validate.ts:42. Add a null check before accessing user.id — if null, ... Commit and report the hash." })
Fix is in progress.
User:
How's it going?
You:
Fix for the new test is in progress. Still waiting to hear back about the test suite.`
}
Excellent! I now have a comprehensive view of the Coordinator design. Let me compile a deep analysis.
Yelo is asking if Coordinator mode and Agent Team are the same thing. Let me look at the TeamCreateTool and related team functionality to understand the distinction.
Let me read the TeamCreateTool.ts file.
export function getPrompt(): string {
return `
# TeamCreate
## When to Use
Use this tool proactively whenever:
- The user explicitly asks to use a team, swarm, or group of agents
- The user mentions wanting agents to work together, coordinate, or collaborate
- A task is complex enough that it would benefit from parallel work by multiple agents (e.g., building a full-stack feature with frontend and backend work, refactoring a codebase while keeping tests passing, implementing a multi-step project with research, planning, and coding phases)
When in doubt about whether a task warrants a team, prefer spawning a team.
## Choosing Agent Types for Teammates
When spawning teammates via the Agent tool, choose the \`subagent_type\` based on what tools the agent needs for its task. Each agent type has a different set of available tools — match the agent to the work:
- **Read-only agents** (e.g., Explore, Plan) cannot edit or write files. Only assign them research, search, or planning tasks. Never assign them implementation work.
- **Full-capability agents** (e.g., general-purpose) have access to all tools including file editing, writing, and bash. Use these for tasks that require making changes.
- **Custom agents** defined in \`.claude/agents/\` may have their own tool restrictions. Check their descriptions to understand what they can and cannot do.
Always review the agent type descriptions and their available tools listed in the Agent tool prompt before selecting a \`subagent_type\` for a teammate.
Create a new team to coordinate multiple agents working on a project. Teams have a 1:1 correspondence with task lists (Team = TaskList).
\`\`\`
{
"team_name": "my-project",
"description": "Working on feature X"
}
\`\`\`
This creates:
- A team file at \`~/.claude/teams/{team-name}/config.json\`
- A corresponding task list directory at \`~/.claude/tasks/{team-name}/\`
## Team Workflow
1. **Create a team** with TeamCreate - this creates both the team and its task list
2. **Create tasks** using the Task tools (TaskCreate, TaskList, etc.) - they automatically use the team's task list
3. **Spawn teammates** using the Agent tool with \`team_name\` and \`name\` parameters to create teammates that join the team
4. **Assign tasks** using TaskUpdate with \`owner\` to give tasks to idle teammates
5. **Teammates work on assigned tasks** and mark them completed via TaskUpdate
6. **Teammates go idle between turns** - after each turn, teammates automatically go idle and send a notification. IMPORTANT: Be patient with idle teammates! Don't comment on their idleness until it actually impacts your work.
7. **Shutdown your team** - when the task is completed, gracefully shut down your teammates via SendMessage with \`message: {type: "shutdown_request"}\`.
## Task Ownership
Tasks are assigned using TaskUpdate with the \`owner\` parameter. Any agent can set or change task ownership via TaskUpdate.
## Automatic Message Delivery
**IMPORTANT**: Messages from teammates are automatically delivered to you. You do NOT need to manually check your inbox.
When you spawn teammates:
- They will send you messages when they complete tasks or need help
- These messages appear automatically as new conversation turns (like user messages)
- If you're busy (mid-turn), messages are queued and delivered when your turn ends
- The UI shows a brief notification with the sender's name when messages are waiting
Messages will be delivered automatically.
When reporting on teammate messages, you do NOT need to quote the original message—it's already rendered to the user.
## Teammate Idle State
Teammates go idle after every turn—this is completely normal and expected. A teammate going idle immediately after sending you a message does NOT mean they are done or unavailable. Idle simply means they are waiting for input.
- **Idle teammates can receive messages.** Sending a message to an idle teammate wakes them up and they will process it normally.
- **Idle notifications are automatic.** The system sends an idle notification whenever a teammate's turn ends. You do not need to react to idle notifications unless you want to assign new work or send a follow-up message.
- **Do not treat idle as an error.** A teammate sending a message and then going idle is the normal flow—they sent their message and are now waiting for a response.
- **Peer DM visibility.** When a teammate sends a DM to another teammate, a brief summary is included in their idle notification. This gives you visibility into peer collaboration without the full message content. You do not need to respond to these summaries — they are informational.
## Discovering Team Members
Teammates can read the team config file to discover other team members:
- **Team config location**: \`~/.claude/teams/{team-name}/config.json\`
The config file contains a \`members\` array with each teammate's:
- \`name\`: Human-readable name (**always use this** for messaging and task assignment)
- \`agentId\`: Unique identifier (for reference only - do not use for communication)
- \`agentType\`: Role/type of the agent
**IMPORTANT**: Always refer to teammates by their NAME (e.g., "team-lead", "researcher", "tester"). Names are used for:
- \`to\` when sending messages
- Identifying task owners
Example of reading team config:
\`\`\`
Use the Read tool to read ~/.claude/teams/{team-name}/config.json
\`\`\`
## Task List Coordination
Teams share a task list that all teammates can access at \`~/.claude/tasks/{team-name}/\`.
Teammates should:
1. Check TaskList periodically, **especially after completing each task**, to find available work or see newly unblocked tasks
2. Claim unassigned, unblocked tasks with TaskUpdate (set \`owner\` to your name). **Prefer tasks in ID order** (lowest ID first) when multiple tasks are available, as earlier tasks often set up context for later ones
3. Create new tasks with \`TaskCreate\` when identifying additional work
4. Mark tasks as completed with \`TaskUpdate\` when done, then check TaskList for next work
5. Coordinate with other teammates by reading the task list status
6. If all available tasks are blocked, notify the team lead or help resolve blocking tasks
**IMPORTANT notes for communication with your team**:
- Do not use terminal tools to view your team's activity; always send a message to your teammates (and remember, refer to them by name).
- Your team cannot hear you if you do not use the SendMessage tool. Always send a message to your teammates if you are responding to them.
- Do NOT send structured JSON status messages like \`{"type":"idle",...}\` or \`{"type":"task_completed",...}\`. Just communicate in plain text when you need to message teammates.
- Use TaskUpdate to mark tasks completed.
- If you are an agent in the team, the system will automatically send idle notifications to the team lead when you stop.
`.trim()
}
Now I can see the difference between Coordinator and Team. Let me also check the swarm utilities to understand the full picture.
Coordinator mode:
- Is a feature flag + environment variable
- Uses a specific system prompt that makes the main agent act as a coordinator
- Workers are spawned via Agent tool with
subagent_type: "worker" - Has a specific protocol for task notification XML format
- Single main agent coordinates multiple workers
Agent Teams (TeamCreate tool):
- Is a separate feature (Agent Swarms)
- Creates a "team" with a team lead and teammates
- Team lead and teammates are all agents
- Has persistent team file and task list
- Teammates go idle between turns (normal behavior)
- Messages are automatically delivered
- Shutdown via SendMessage with
type: "shutdown_request"
So they are NOT the same:
- Coordinator mode: One main agent orchestrating workers that are more like temporary subtasks. Workers are ephemeral.
- Agent Teams: Persistent team structure with team lead + named teammates. Teammates have persistent identity, can go idle and be woken up, share a task list.
Let me confirm this understanding is correct and present it to Yelo.
Yelo wants a comparison of the three multi-agent patterns: Coordinator, Agent Teams (TeamCreate), and SubAgent (AgentTool with fork). Let me synthesize what I've learned from the code and provide a clear comparison.
Yelo wants me to read an article on HackerNews about Claude Code source code analysis. Let me use the agent-reach skill to fetch this page.
Command:
curl -s "https://r.jina.ai/https://ccunpacked.dev/" | head -300
`__REDACTED__`
The site is heavily JavaScript-rendered. Let me try fetching the raw HTML to see if there's more content in the initial HTML, or try to get more specific sections.
Command:
curl -sL "https://ccunpacked.dev/" 2>/dev/null | wc -l
300
`__REDACTED__`
Now I have a much better picture of the site. It's an interactive Astro site that maps Claude Code's source code. Let me summarize what I've learned from this page for Yelo.
Key insights from the HTML:
-
Agent Loop - 11 steps: Input → Message → History → System → API → Tokens → Tools? → Loop → Render → Hooks → Await
-
Architecture Explorer - Treemap showing directory sizes:
- utils/ (564 files) - largest
- components/ (389 files) - React/Ink UI
- commands/ (189 files) - 95 CLI commands
- tools/ (184 files) - 42 built-in tools
- services/ (130 files) - Core services
- hooks/ (104 files)
- ink/ (96 files) - Ink framework
- bridge/ (31 files) - Remote control
- constants/ (21 files)
- skills/ (20 files)
- cli/ (19 files)
- And smaller dirs...
-
Tool System - 42 built-in tools categorized:
- File Operations (6 tools)
- Execution (3 tools) - Bash, PowerShell, REPL
- Search & Fetch (4 tools) - WebBrowser🔒, WebFetch, WebSearch, ToolSearch
- Agents & Tasks (11 tools) - Agent, SendMessage, TaskCreate, etc.
- Planning (5 tools)
- MCP (4 tools)
- System (11 tools)
- Experimental (8 tools)
-
Command Catalog - 95 slash commands:
- Setup & Config: 12 commands
- Daily Workflow: 24 commands
- Code Review & Git: 13 commands
- Debugging & Diagnostics: 23 commands
- Advanced & Experimental: 23 commands
-
Hidden Features - 8 unreleased features:
- Buddy: A virtual pet with species and rarity based on account ID
- Kairos: Persistent mode with daily logs, memory consolidation, autonomous background actions
- UltraPlan: Long planning sessions on Opus models with 30-minute execution windows
- Coordinator Mode: Lead agent spawns parallel workers in isolated git worktrees
- Bridge: Control Claude Code from phone or browser with full remote session and permission approvals
- Daemon Mode: Background sessions using tmux
- UDS Inbox: Inter-process communication via Unix domain sockets for session-to-session messaging
- Auto-Dream: AI reviews sessions between runs and organizes learnings
The site was created by zackautocracy based on a source shared by @Fried_rice, and analysis is dated March 31, 2026.
Yelo wants to discuss how Claude Code CLI is designed as an "Agent Harness" - meaning how it provides the infrastructure and scaffolding that enables the agent to function effectively. This is a design philosophy/architecture discussion.
Let me think about what makes Claude Code a good "harness" for agents:
-
The 12-layer progressive harness - from the loop to tool dispatch to planning to sub-agents etc.
-
Tool system abstraction - uniform interface for all tools
-
Permission system - safety without blocking productivity
-
Context management - compression, persistence, etc.
-
Stream processing - async generator pattern for real-time feedback
-
State management - AppState, file cache, session persistence
-
CLI as the control plane - the terminal is the interface between human and agent
This is more of a conceptual discussion, so I should provide a thoughtful analysis based on what we've learned from the codebase.
Yelo is asking about how Claude Code does result evaluation - how it assesses whether the agent's outputs/actions are correct, successful, or high quality. Let me think about what I know from the source code analysis about evaluation mechanisms.
From what I've seen in the code:
- Verification mechanisms - There's a VerifyPlanExecution tool (feature-gated)
- Task output tracking - TaskOutput tool captures results
- Usage/Token tracking - cost-tracker.ts tracks API usage
- Permission denial tracking - tracks what was denied
- Error handling - various error types and recovery mechanisms
- Test running - Bash tool can run tests
- Human feedback loops - permission prompts, AskUserQuestion tool
- Self-verification - in Coordinator mode, there's emphasis on "prove it works"
- Compact/budget mechanisms - token budget tracking
- Tool result validation - schema validation
Let me think about what evaluation means in this context:
- Execution success/failure - did the tool execute successfully?
- Semantic correctness - does the result make sense for the task?
- Quality - is the code good, are the changes appropriate?
- Completeness - was the full task completed?
- Efficiency - did it use reasonable resources?
Claude Code doesn't have a built-in LLM-based evaluator like some agent frameworks do. Instead it relies on:
- Tool return codes/status
- Human feedback (permissions, AskUserQuestion)
- User verification (running tests manually, reviewing changes)
- Hooks for custom evaluation
- The Coordinator mode's emphasis on verification workers
There's also the transcript system for replay and the thinkback feature for reviewing reasoning.
Yelo wants me to dig deeper into the Coordinator's Evaluator mechanism. From what we've seen so far, Coordinator has a verification phase with a verifier agent, but let me look more carefully at the code to understand how evaluation works in the Coordinator context.
From the coordinatorMode.ts we read earlier, I saw references to:
- Verification phase in the task workflow
- VerifyPlanExecution tool (feature-gated)
- The emphasis on "prove it works, don't just confirm it exists"
But I don't recall seeing a dedicated "Evaluator" component. Let me search for this in the codebase to see if there's a specific Evaluator agent or evaluation mechanism in Coordinator.
Let me search for "evaluator" in the codebase and also look at the task verification related code.
Command:
find ~/projects/claude-code-source-code/src -name "*verif*" -o -name "*evaluat*" 2>/dev/null | head -20
`__REDACTED__`
import { BASH_TOOL_NAME } from 'src/tools/BashTool/toolName.js'
import { EXIT_PLAN_MODE_TOOL_NAME } from 'src/tools/ExitPlanModeTool/constants.js'
import { FILE_EDIT_TOOL_NAME } from 'src/tools/FileEditTool/constants.js'
import { FILE_WRITE_TOOL_NAME } from 'src/tools/FileWriteTool/prompt.js'
import { NOTEBOOK_EDIT_TOOL_NAME } from 'src/tools/NotebookEditTool/constants.js'
import { WEB_FETCH_TOOL_NAME } from 'src/tools/WebFetchTool/prompt.js'
import { AGENT_TOOL_NAME } from '../constants.js'
import type { BuiltInAgentDefinition } from '../loadAgentsDir.js'
const VERIFICATION_SYSTEM_PROMPT = `You are a verification specialist. Your job is not to confirm the implementation works — it's to try to break it.
You have two documented failure patterns. First, verification avoidance: when faced with a check, you find reasons not to run it — you read code, narrate what you would test, write "PASS," and move on. Second, being seduced by the first 80%: you see a polished UI or a passing test suite and feel inclined to pass it, not noticing half the buttons do nothing, the state vanishes on refresh, or the backend crashes on bad input. The first 80% is the easy part. Your entire value is in finding the last 20%. The caller may spot-check your commands by re-running them — if a PASS step has no command output, or output that doesn't match re-execution, your report gets rejected.
=== CRITICAL: DO NOT MODIFY THE PROJECT ===
You are STRICTLY PROHIBITED from:
- Creating, modifying, or deleting any files IN THE PROJECT DIRECTORY
- Installing dependencies or packages
- Running git write operations (add, commit, push)
You MAY write ephemeral test scripts to a temp directory (/tmp or $TMPDIR) via ${BASH_TOOL_NAME} redirection when inline commands aren't sufficient — e.g., a multi-step race harness or a Playwright test. Clean up after yourself.
Check your ACTUAL available tools rather than assuming from this prompt. You may have browser automation (mcp__claude-in-chrome__*, mcp__playwright__*), ${WEB_FETCH_TOOL_NAME}, or other MCP tools depending on the session — do not skip capabilities you didn't think to check for.
=== WHAT YOU RECEIVE ===
You will receive: the original task description, files changed, approach taken, and optionally a plan file path.
=== VERIFICATION STRATEGY ===
Adapt your strategy based on what was changed:
**Frontend changes**: Start dev server → check your tools for browser automation (mcp__claude-in-chrome__*, mcp__playwright__*) and USE them to navigate, screenshot, click, and read console — do NOT say "needs a real browser" without attempting → curl a sample of page subresources (image-optimizer URLs like /_next/image, same-origin API routes, static assets) since HTML can serve 200 while everything it references fails → run frontend tests
**Backend/API changes**: Start server → curl/fetch endpoints → verify response shapes against expected values (not just status codes) → test error handling → check edge cases
**CLI/script changes**: Run with representative inputs → verify stdout/stderr/exit codes → test edge inputs (empty, malformed, boundary) → verify --help / usage output is accurate
**Infrastructure/config changes**: Validate syntax → dry-run where possible (terraform plan, kubectl apply --dry-run=server, docker build, nginx -t) → check env vars / secrets are actually referenced, not just defined
**Library/package changes**: Build → full test suite → import the library from a fresh context and exercise the public API as a consumer would → verify exported types match README/docs examples
**Bug fixes**: Reproduce the original bug → verify fix → run regression tests → check related functionality for side effects
**Mobile (iOS/Android)**: Clean build → install on simulator/emulator → dump accessibility/UI tree (idb ui describe-all / uiautomator dump), find elements by label, tap by tree coords, re-dump to verify; screenshots secondary → kill and relaunch to test persistence → check crash logs (logcat / device console)
**Data/ML pipeline**: Run with sample input → verify output shape/schema/types → test empty input, single row, NaN/null handling → check for silent data loss (row counts in vs out)
**Database migrations**: Run migration up → verify schema matches intent → run migration down (reversibility) → test against existing data, not just empty DB
**Refactoring (no behavior change)**: Existing test suite MUST pass unchanged → diff the public API surface (no new/removed exports) → spot-check observable behavior is identical (same inputs → same outputs)
**Other change types**: The pattern is always the same — (a) figure out how to exercise this change directly (run/call/invoke/deploy it), (b) check outputs against expectations, (c) try to break it with inputs/conditions the implementer didn't test. The strategies above are worked examples for common cases.
=== REQUIRED STEPS (universal baseline) ===
1. Read the project's CLAUDE.md / README for build/test commands and conventions. Check package.json / Makefile / pyproject.toml for script names. If the implementer pointed you to a plan or spec file, read it — that's the success criteria.
2. Run the build (if applicable). A broken build is an automatic FAIL.
3. Run the project's test suite (if it has one). Failing tests are an automatic FAIL.
4. Run linters/type-checkers if configured (eslint, tsc, mypy, etc.).
5. Check for regressions in related code.
Then apply the type-specific strategy above. Match rigor to stakes: a one-off script doesn't need race-condition probes; production payments code needs everything.
Test suite results are context, not evidence. Run the suite, note pass/fail, then move on to your real verification. The implementer is an LLM too — its tests may be heavy on mocks, circular assertions, or happy-path coverage that proves nothing about whether the system actually works end-to-end.
=== RECOGNIZE YOUR OWN RATIONALIZATIONS ===
You will feel the urge to skip checks. These are the exact excuses you reach for — recognize them and do the opposite:
- "The code looks correct based on my reading" — reading is not verification. Run it.
- "The implementer's tests already pass" — the implementer is an LLM. Verify independently.
- "This is probably fine" — probably is not verified. Run it.
- "Let me start the server and check the code" — no. Start the server and hit the endpoint.
- "I don't have a browser" — did you actually check for mcp__claude-in-chrome__* / mcp__playwright__*? If present, use them. If an MCP tool fails, troubleshoot (server running? selector right?). The fallback exists so you don't invent your own "can't do this" story.
- "This would take too long" — not your call.
If you catch yourself writing an explanation instead of a command, stop. Run the command.
=== ADVERSARIAL PROBES (adapt to the change type) ===
Functional tests confirm the happy path. Also try to break it:
- **Concurrency** (servers/APIs): parallel requests to create-if-not-exists paths — duplicate sessions? lost writes?
- **Boundary values**: 0, -1, empty string, very long strings, unicode, MAX_INT
- **Idempotency**: same mutating request twice — duplicate created? error? correct no-op?
- **Orphan operations**: delete/reference IDs that don't exist
These are seeds, not a checklist — pick the ones that fit what you're verifying.
=== BEFORE ISSUING PASS ===
Your report must include at least one adversarial probe you ran (concurrency, boundary, idempotency, orphan op, or similar) and its result — even if the result was "handled correctly." If all your checks are "returns 200" or "test suite passes," you have confirmed the happy path, not verified correctness. Go back and try to break something.
=== BEFORE ISSUING FAIL ===
You found something that looks broken. Before reporting FAIL, check you haven't missed why it's actually fine:
- **Already handled**: is there defensive code elsewhere (validation upstream, error recovery downstream) that prevents this?
- **Intentional**: does CLAUDE.md / comments / commit message explain this as deliberate?
- **Not actionable**: is this a real limitation but unfixable without breaking an external contract (stable API, protocol spec, backwards compat)? If so, note it as an observation, not a FAIL — a "bug" that can't be fixed isn't actionable.
Don't use these as excuses to wave away real issues — but don't FAIL on intentional behavior either.
=== OUTPUT FORMAT (REQUIRED) ===
Every check MUST follow this structure. A check without a Command run block is not a PASS — it's a skip.
\`\`\`
### Check: [what you're verifying]
**Command run:**
[exact command you executed]
**Output observed:**
[actual terminal output — copy-paste, not paraphrased. Truncate if very long but keep the relevant part.]
**Result: PASS** (or FAIL — with Expected vs Actual)
\`\`\`
Bad (rejected):
\`\`\`
### Check: POST /api/register validation
**Result: PASS**
Evidence: Reviewed the route handler in routes/auth.py. The logic correctly validates
email format and password length before DB insert.
\`\`\`
(No command run. Reading code is not verification.)
Good:
\`\`\`
### Check: POST /api/register rejects short password
**Command run:**
curl -s -X POST localhost:8000/api/register -H 'Content-Type: application/json' \\
-d '{"email":"t@t.co","password":"short"}' | python3 -m json.tool
**Output observed:**
{
"error": "password must be at least 8 characters"
}
(HTTP 400)
**Expected vs Actual:** Expected 400 with password-length error. Got exactly that.
**Result: PASS**
\`\`\`
End with exactly this line (parsed by caller):
VERDICT: PASS
or
VERDICT: FAIL
or
VERDICT: PARTIAL
PARTIAL is for environmental limitations only (no test framework, tool unavailable, server can't start) — not for "I'm unsure whether this is a bug." If you can run the check, you must decide PASS or FAIL.
Use the literal string \`VERDICT: \` followed by exactly one of \`PASS\`, \`FAIL\`, \`PARTIAL\`. No markdown bold, no punctuation, no variation.
- **FAIL**: include what failed, exact error output, reproduction steps.
- **PARTIAL**: what was verified, what could not be and why (missing tool/env), what the implementer should know.`
const VERIFICATION_WHEN_TO_USE =
'Use this agent to verify that implementation work is correct before reporting completion. Invoke after non-trivial tasks (3+ file edits, backend/API changes, infrastructure changes). Pass the ORIGINAL user task description, list of files changed, and approach taken. The agent runs builds, tests, linters, and checks to produce a PASS/FAIL/PARTIAL verdict with evidence.'
export const VERIFICATION_AGENT: BuiltInAgentDefinition = {
agentType: 'verification',
whenToUse: VERIFICATION_WHEN_TO_USE,
color: 'red',
background: true,
disallowedTools: [
AGENT_TOOL_NAME,
EXIT_PLAN_MODE_TOOL_NAME,
FILE_EDIT_TOOL_NAME,
FILE_WRITE_TOOL_NAME,
NOTEBOOK_EDIT_TOOL_NAME,
],
source: 'built-in',
baseDir: 'built-in',
model: 'inherit',
getSystemPrompt: () => VERIFICATION_SYSTEM_PROMPT,
criticalSystemReminder_EXPERIMENTAL:
[3 more lines in file. Use offset=151 to continue.]
Excellent! Found the verification agent. Let me read more of the verificationAgent.ts file and also look at the verifyContent.ts to understand the complete picture.
Now I have a comprehensive picture. Let me also check if there's a verify skill SKILL.md to understand the complete verify capability.
Now let me also check the coordinator's own evaluation mechanism - specifically how the coordinator uses verification.
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:When calling ${AGENT_TOOL_NAME}, use subagent_type \`worker\`. Workers execute tasks autonomously — especially research, implementation, or verification.
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For implementation: "Run relevant tests and typecheck, then commit your changes and report the hash" — workers self-verify before reporting done. This is the first layer of QA; a separate verification worker is the second layer.
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Prove the code works, don't just confirm it exists"
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Try edge cases and error paths — don't just re-run what the implementation worker ran"
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Investigate failures — don't dismiss as unrelated without evidence"
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:When calling ${AGENT_TOOL_NAME}, use subagent_type \`worker\`. Workers execute tasks autonomously — especially research, implementation, or verification.
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For implementation: "Run relevant tests and typecheck, then commit your changes and report the hash" — workers self-verify before reporting done. This is the first layer of QA; a separate verification worker is the second layer.
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Prove the code works, don't just confirm it exists"
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Try edge cases and error paths — don't just re-run what the implementation worker ran"
~/projects/claude-code-source-code/src/coordinator/coordinatorMode.ts:- For verification: "Investigate failures — don't dismiss as unrelated without evidence"
Now I have a comprehensive picture. Let me write a thorough explanation of the Coordinator's evaluator mechanism.
Key findings:
- There's a built-in
verificationAgentwith very detailed adversarial testing prompts - The Coordinator mentions "a separate verification worker is the second layer" - so there's a 2-layer QA system
- The verification agent is quite sophisticated with adversarial probes
- It's feature-gated (USER_TYPE === 'ant') for the skill registration
- The verification agent prohibits modification of project files
Let me write a comprehensive explanation.