AI 执行控制架构

当前部署决策（2026-04-24）

Canonical target: E:\My Project\Atramenti-Console\codex\plugins\obsidian\data\docs\codex-knowledge\AI 执行控制架构.md
Audience: 未来自己、继续接手云端 MCP 的实现者、运维与部署执行人
Status date: 2026-04-24

这份文档现在不再只是抽象架构说明，还承载当前已经定下来的落机决策。

已定结论

工业级云端 MCP 不放到当前任一现有公网机上作为完整执行宿主。

当前正式定型为：

170.106.179.226
  = Gateway / Auth / MCP Router
 
121.196.202.114
  = lightweight remote MCP runtime
 
new heavy worker node
  = industrial cloud MCP executor

也就是：

170 继续只做公网入口层
121 继续只做轻量远程 MCP
新增一台 heavy worker node 承担任务编排、浏览器池、代码执行、受限 shell、artifact 与日志链路

为什么不是 170

当前现场已经确认：

170.106.179.226 是 active 的正式入口机
它当前角色是 nginx + api gateway
key.tengokukk.com 的正式 AI 网关链路已经收敛在这台机器上

因此它适合承载：

Gateway
Auth
Rate Limit
MCP Router
轻量协议转换

但不适合再混跑：

browser-worker
code-worker
shell-worker
浏览器池
沙箱执行器
重队列消费器

原因不是“绝对跑不动”，而是：

一旦执行层把 CPU / RAM / 磁盘 / shm / 网络打满，先受影响的会是正式公网入口。

这会把：

入口层稳定性
执行层风险
线上模型网关可用性

绑定成同一个故障域。

为什么不是 121

121.196.202.114 当前只应继续承担 lightweight remote MCP runtime。

它适合：

context-store
database-ops-mcp
log-diagnose
output-guard
prompt-compressor
repo-inspector
ripgrep-search

它不适合升级成工业级主执行池，原因已经在当前运行规则里写清：

它是轻量节点
重检查与大测试不应上这台机器
目前已有 SSH / banner timeout 的历史约束

所以对 121 的正确用法不是“继续加码”，而是：

保留轻量职责，避免变成新的耦合点。

新 heavy worker node 的职责边界

新增节点只承载执行层，不承载公网正式入口。

首批职责：

task-orchestrator
worker-runtime
browser-worker
code-worker
shell-worker（受限）
artifact storage（本地缓存层）
job logs
受控工作目录 / 沙箱目录

不放到 heavy node 的内容：

主域名 TLS 终止
公网总入口
生产 AI 网关对外入口

当前执行拓扑定版

Client / Codex / Cline / Roo
  -> 170 gateway
  -> MCP Router
  -> Task Orchestrator
  -> new heavy worker node
       -> browser-worker
       -> code-worker
       -> shell-worker
       -> artifacts / logs
 
121 runtime node
  -> 继续承接轻量 MCP

第一阶段实施清单

Phase 1: 购置与准备

新开 heavy worker node
装 Node 22、pnpm、Docker、docker compose
建工作根目录，例如 /srv/cloud-mcp
建运行目录，例如 /srv/cloud-mcp/.runtime

Phase 2: 先落最小执行闭环

task-orchestrator
browser-worker
code-worker
redis
postgres
artifact/log 目录

Phase 3: 与 170 打通

170 只转发到内部控制面
不把高风险执行能力直接暴露公网
所有副作用动作都转任务，不走同步裸执行

Phase 4: 把 121 保持在轻量层

保留已有轻量远程 MCP
不迁入浏览器池与大构建任务
仅在需要时承接只读或轻量能力

当前不做的事

在 heavy node 未到位前，当前不做下面这些高耦合方案：

不把完整 worker pool 直接堆到 170
不把 121 临时拔高成工业级执行宿主
不把本地 Windows 工作站长期当作云端执行中枢

一句话定型：

当前服务器体系里的最佳放置方式不是“从现有机器里硬选一台”，而是“保留 170 入口层、保留 121 轻量层、补一台 heavy worker node 承担工业级执行层”。下面这套是按你当前系统抽出来的“AI 执行控制架构”，目标不是只防危险命令，而是同时提升：

执行成功率
结果可信度
可追责性
可扩展性

我先给结论：

不要让 AI 直接碰 shell / 文件 / 网络。

要让 AI 只碰“受控能力”，所有真实执行都经过一层 Execution Control Plane。

1. 总体形态

你现在大概是：

Client / Codex
   ↓
Nginx / Gateway
   ↓
AIClient2API
   ↓
Upstream model
   ↓
tool_call
   ↓
shell / file / network

我要你升级成：

Client / Codex
   ↓
API Gateway
   ↓
Model Router
   ↓
Execution Control Plane
   ├─ Policy Engine
   ├─ Tool Registry
   ├─ Planner / Validator
   ├─ Risk Scorer
   ├─ Approval Gate
   ├─ Sandboxed Executors
   ├─ State Store
   ├─ Audit / Replay
   └─ Observability
   ↓
Real effects:
   ├─ file ops
   ├─ shell ops
   ├─ network ops
   ├─ browser ops
   └─ MCP / external services

核心原则就一句：

模型负责提出动作，控制平面负责决定动作能不能发生、怎么发生、发生后如何核验。

2. 分层职责

A. API Gateway

职责很窄：

鉴权
限流
路由
请求标准化
SSE/stream 稳定转发

不要在这里做复杂判断。

B. Model Router

职责：

选择模型
统一协议
把 chat/responses/streaming 统一成内部事件流

输出不要直接是“执行命令”，而是内部统一结构：

{
  "session_id": "s_123",
  "turn_id": "t_456",
  "intent": "clean_desktop_shortcuts",
  "proposed_actions": [
    {
      "tool": "fs.delete_shortcut",
      "args": {
        "path": "C:\\Users\\ASUS-KL\\Desktop\\xxx.lnk"
      }
    }
  ],
  "raw_model_output": {}
}

C. Execution Control Plane

这是主角。

它由 8 个子模块组成。

3. 八个核心模块

3.1 Tool Registry

不要给 AI 直接暴露 shell。
先把能力注册成“工具能力目录”。

例如：

fs.list_desktop_shortcuts
fs.inspect_shortcut_target
fs.move_to_recycle_bin
shell.run_powershell_readonly
shell.run_powershell_mutation
net.fetch_http_get
browser.click
browser.type
doc.write_markdown

每个工具都带元数据：

{
  "tool_name": "fs.move_to_recycle_bin",
  "category": "filesystem",
  "risk_level": "medium",
  "mutates_state": true,
  "requires_approval": false,
  "allowed_paths": [
    "C:\\Users\\ASUS-KL\\Desktop",
    "C:\\Users\\Public\\Desktop"
  ],
  "validator": "validate_desktop_path",
  "executor": "windows_fs_executor"
}

重点：

AI 看见的是“业务工具”，不是裸命令。

3.2 Policy Engine

这是门禁核心。

它判断：

这个会话能不能调用这个工具
参数是否合法
当前上下文下是否允许执行
是否需要人工确认
是否需要降权执行

策略建议用三层：

全局策略

例如：

禁止删除系统目录
禁止写入 C:\Windows
禁止联网访问未允许域名
禁止执行编码后的 PowerShell

工作区策略

例如：

只允许改当前 repo
文档默认写到指定知识库
只允许动桌面快捷方式，不动其他用户目录

任务策略

例如：

本次任务是“清理失效快捷方式”
允许 inspect -> classify -> recycle
不允许 shell download, registry edit

策略输出例子：

{
  "decision": "allow_with_constraints",
  "constraints": {
    "max_actions": 20,
    "allowed_tools": [
      "fs.list_desktop_shortcuts",
      "fs.inspect_shortcut_target",
      "fs.move_to_recycle_bin"
    ],
    "allowed_paths": [
      "C:\\Users\\ASUS-KL\\Desktop",
      "C:\\Users\\Public\\Desktop"
    ]
  },
  "reason": "desktop shortcut cleanup scoped task"
}

3.3 Risk Scorer

不是所有 allow 都一样。

给每个 action 算风险分，维度包括：

工具风险
参数风险
路径风险
破坏性
不可逆性
外部副作用
是否偏离用户意图
是否连续失败后仍在重试

示例：

{
  "tool": "shell.run_powershell_mutation",
  "risk_score": 83,
  "risk_factors": [
    "mutating_shell",
    "touches_user_home",
    "wildcard_delete",
    "low_explainability"
  ]
}

可设置阈值：

0-30：自动执行
31-60：需要二次验证
61-80：需要人工确认
81+：拒绝或拆解重规划

3.4 Planner / Validator

让模型“先提计划，再执行”，而不是一上来就开干。

流程：

模型生成计划
Planner 把计划拆成原子 action
Validator 检查：
- 参数是否齐全
- 是否可执行
- 是否和任务目标一致
- 是否违反状态机

比如桌面清理任务：

Plan:
1. 枚举快捷方式
2. 检查目标是否存在
3. 标记失效项
4. 移入回收站
5. 复查并汇报

原子化后：

[
  {"tool":"fs.list_desktop_shortcuts","args":{"scope":"user_and_public"}},
  {"tool":"fs.inspect_shortcut_target","args":{"path":"...1.lnk"}},
  {"tool":"fs.inspect_shortcut_target","args":{"path":"...2.lnk"}},
  {"tool":"fs.move_to_recycle_bin","args":{"path":"...2.lnk"}}
]

这样 AI 就不容易跳步骤。

3.5 Approval Gate

高风险动作别直接执行。

批准类型建议三档：

Auto

低风险自动通过。

Silent checkpoint

系统自动继续，但留下 checkpoint。
适合批量中风险操作。

Human approval

明确需要你点头。

比如：

删除文件：checkpoint
删除目录：人工确认
联网下载并执行：强制人工确认
改系统代理/注册表：强制人工确认

批准消息应是结构化的，不是自由文本：

{
  "approval_id": "appr_001",
  "summary": "Delete 12 broken desktop shortcuts",
  "risk": "medium",
  "why": [
    "all targets missing",
    "scope limited to desktop",
    "recycle bin used instead of permanent delete"
  ],
  "actions": [
    {"tool":"fs.move_to_recycle_bin","count":12}
  ]
}

3.6 Sandboxed Executors

真正执行必须分仓。

不要一个总 shell。
至少拆成：

fs_executor
shell_ro_executor
shell_rw_executor
network_executor
browser_executor
mcp_executor

每个 executor 都有独立约束：

shell_ro_executor

只允许只读命令：

Get-ChildItem
Get-Content
rg
Select-String

shell_rw_executor

允许写，但必须：

带工作目录
带超时
带路径白名单
带命令 AST 检查

network_executor

只允许白名单域名，默认 deny。

browser_executor

只能操作允许站点和允许动作。

关键点：

“工具权限” 和 “模型权限” 分开。
模型可以提议，executor 才决定是否实际落地。

3.7 State Store

这是提高准确率的关键。

你需要一个状态库记录：

当前任务目标
已执行动作
文件快照
最近失败原因
允许作用域
审批结果
回滚点

没有状态库，AI 很容易：

重复执行
忘记做过什么
产生幻觉式“我已经删了”

例如：

{
  "task_id": "task_desktop_cleanup_001",
  "objective": "clean broken desktop shortcuts",
  "scope": [
    "C:\\Users\\ASUS-KL\\Desktop",
    "C:\\Users\\Public\\Desktop"
  ],
  "artifacts": {
    "found_shortcuts": 31,
    "broken_shortcuts": 12,
    "recycled_shortcuts": 12
  },
  "last_action": "fs.move_to_recycle_bin",
  "rollback_points": [
    "recycle_bin_snapshot_20260423_101100"
  ]
}

3.8 Audit / Replay / Observability

工业级一定要能追责。

每次执行都要记录：

输入
模型输出
策略判定
风险分
最终批准状态
实际命令
stdout/stderr
产物
结果校验

日志建议分三类：

Decision log

为什么允许/拒绝

Action log

实际干了什么

Verification log

结果是否符合预期

最好支持 replay：

以后能拿某次事故完整回放。

4. 控制流

4.1 标准执行流

用户请求
  ↓
Model Router
  ↓
提议 actions
  ↓
Policy Engine
  ↓
Risk Scorer
  ↓
Planner/Validator
  ↓
Approval Gate
  ↓
Sandboxed Executor
  ↓
Post-Execution Verifier
  ↓
State Store + Audit
  ↓
返回用户

4.2 后置校验

这是你当前系统特别缺的。

比如 AI 说“文件已创建”，必须再校验：

文件是否存在
内容是否正确
路径是否对
是否出现副作用

例子：

{
  "verification": {
    "type": "file_exists_and_content_match",
    "path": "C:\\Users\\ASUS-KL\\.codex\\.tmp\\tool-action-proof.txt",
    "expected_content": "TOOL-OK",
    "actual": "TOOL-OK",
    "passed": true
  }
}

没有这一层，AI 很容易“说做了但没做”。

5. 工业级状态机

每个任务建议进入有限状态机：

DRAFT
→ PLANNED
→ POLICY_CHECKED
→ APPROVED
→ EXECUTING
→ VERIFYING
→ COMPLETED
→ FAILED
→ ROLLED_BACK

这样可以防止：

未审批就执行
执行失败却报成功
校验没过也结束

6. 你的系统具体怎么插

按你当前形态，建议这样落：

170 nginx
  ↓
3301 AIClient2API
  ↓
Execution Control Plane  (新增)
  ├─ policy-service
  ├─ tool-registry
  ├─ executor-service
  ├─ verifier-service
  └─ audit-store
  ↓
3000 upstream / model

但更合理的内部顺序是：

client
  ↓
gateway
  ↓
control-plane
  ↓
model / upstream
  ↓
tool proposals
  ↓
control-plane
  ↓
executors

也就是控制平面最好包住模型前后，不是只在后面拦。

7. 最小可落地版本

先别一口吃成平台。第一版做 6 个模块就够：

1. Tool Registry
2. Policy Engine
3. Risk Scorer
4. Approval Gate
5. Executors
6. Post-Execution Verifier

并且只先管 4 类动作：

文件读
文件写
回收站删除
shell 命令

你马上就能把你当前系统从“能执行”升级到“可控执行”。

8. 推荐目录结构

按你的模块化偏好，建议：

modules/execution-control/
  module.json
  README.md
  src/
    contracts/
      action.ts
      policy.ts
      approval.ts
      execution-result.ts
    application/
      plan-actions.ts
      evaluate-policy.ts
      score-risk.ts
      request-approval.ts
      execute-action.ts
      verify-result.ts
    domain/
      entities/
        action-plan.ts
        risk-report.ts
        approval-ticket.ts
      services/
        policy-engine.ts
        risk-engine.ts
        verification-engine.ts
    infrastructure/
      policy/
        static-policy-loader.ts
      registry/
        tool-registry.ts
      executors/
        fs-executor.ts
        shell-ro-executor.ts
        shell-rw-executor.ts
      storage/
        audit-repo.ts
        state-repo.ts
    interface/
      http/
        execution-controller.ts
      cli/
        execution-debug.ts
    index.ts
  tests/
    unit/
    integration/
    contract/

9. 三条硬规则

规则一

模型永远不能直接拿到 unrestricted shell。

规则二

所有有副作用的动作，必须可验证。

规则三

所有高风险动作，必须可中断、可回滚、可追责。

10. 你现在最适合的演进路径

第一步

把裸 shell 拆成受控工具：

fs.list_shortcuts
fs.inspect_shortcut
fs.recycle_item
shell.readonly

第二步

加 Policy Engine + 路径白名单。

第三步

加 Post-Execution Verifier。

第四步

加 Approval Gate。

第五步

再做统一状态机和审计回放。

11. 一句话总结

你要的工业级 AI 执行控制架构，本质上是：

模型提出动作，控制平面裁决动作，执行器受限落地，校验器证明结果。

不是“让 AI 更自由”，而是：

让 AI 只在被约束的轨道里高效工作。

如果你要，我下一步可以直接给你这套架构的 模块接口定义 + TypeScript 骨架代码。

AI 执行控制架构

AI 执行控制架构

当前部署决策（2026-04-24）

已定结论

为什么不是 170

为什么不是 121

新 heavy worker node 的职责边界

推荐最低规格

当前执行拓扑定版

第一阶段实施清单

Phase 1: 购置与准备

Phase 2: 先落最小执行闭环

Phase 3: 与 170 打通

Phase 4: 把 121 保持在轻量层

当前不做的事

1. 总体形态

2. 分层职责

A. API Gateway

B. Model Router

C. Execution Control Plane

3. 八个核心模块

3.1 Tool Registry

3.2 Policy Engine

全局策略

工作区策略

任务策略

3.3 Risk Scorer

3.4 Planner / Validator

3.5 Approval Gate

Auto

Silent checkpoint

Human approval

3.6 Sandboxed Executors

shell_ro_executor

shell_rw_executor

network_executor

browser_executor

3.7 State Store

3.8 Audit / Replay / Observability

Decision log

Action log

Verification log

4. 控制流

4.1 标准执行流

4.2 后置校验

5. 工业级状态机

6. 你的系统具体怎么插

7. 最小可落地版本

8. 推荐目录结构

9. 三条硬规则

规则一

规则二

规则三

10. 你现在最适合的演进路径

第一步

第二步

第三步

第四步

第五步

11. 一句话总结