AGENT

Codex策略与TokenPool项目探索会话记录

2026/04/28 146 min read AGENT 记录 CODEX策略与TOKENPOOL项目探索会话记录

Windows PowerShell 版权所有(C) Microsoft Corporation。保留所有权利。

安装最新的 PowerShell,了解新功能和改进!https://aka.ms/PSWindows │ >_ OpenAI Codex (v0.123.0-alpha.5) │ │ │ │ model: gpt-5.4 medium /model to change │ │ directory: ~ │ │ permissions: YOLO mode │ ╰────────────────────────────────────────────────╯

Tip: New Build faster with Codex.

› 你好

⚠ Some enabled skills were not included in the model-visible skills list for this session. Mention a skill by name or path if you need it. • Running Get-Content -Raw C:\Users\ASUS-KL.codex\instructions.md

• Running Get-Content -Raw C:\Users\ASUS-KL.codex\policy\defaults.md

• Running Get-Content -Raw C:\Users\ASUS-KL.codex\policy\gate-registry.json

• Running Get-Content -Raw C:\Users\ASUS-KL.codex\policy\annotated.md

• Ran Get-Content -Raw C:\Users\ASUS-KL.codex\policy\defaults.md └ # Defaults

… +78 lines (ctrl + t to view transcript)
断,不得长期喧宾夺主。

• Ran Get-Content -Raw C:\Users\ASUS-KL.codex\policy\hard-gates.md └ # Hard Gates

… +136 lines (ctrl + t to view transcript)
push / PR 没做,要说明原因。

• Ran Get-Content -Raw C:\Users\ASUS-KL.codex\policy\annotated.md └ # Policy Annotated

… +479 lines (ctrl + t to view transcript)
- 新任务若是围绕统一 policy control plane,默认优先更新 `C:\Users\ASUS-KL\.codex\policy\annotated.md`

• Ran Get-Content -Raw C:\Users\ASUS-KL.codex\instructions.md └ # Global Codex Instructions

… +136 lines (ctrl + t to view transcript)

• Ran Get-ChildItem -Path C:\Users\ASUS-KL -Directory | Select-Object Name,FullName └ Name FullName │ $roots) { if (Test-Path $r) { Get-ChildItem -Path $r -Directory -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.Name │ … +1 lines └ (no output)

• Ran $roots = @('E:\My Project','C:\Users\ASUS-KL'); foreach ($r in $roots) { if (Test-Path $r) { rg --files $r 2>$null | rg │ "tokenpool|TokenPool|README.md$|project.json$" } } └ E:\My Project\README.md E:\My Project\sites-static\README.md … +1367 lines (ctrl + t to view transcript) C:\Users\ASUS-KL\Documents\Cline\MCP\desktop-commander-local\node_modules\pdf-lib\README.md C:\Users\ASUS-KL\cloude-app-staging\node_modules\react-native\ReactAndroid\src\main\java\com\facebook\react\runtime\README.md

• Running Get-Content -Raw 'E:\My Project\Token Pool\project.json'

• Running Get-ChildItem -Path 'E:\My Project\Token Pool' -Force | Select-Object Mode,Name,Length

• Ran Get-Content -Raw 'E:\My Project\Token Pool\README.md' └ --- title: Token Pool … +2649 lines (ctrl + t to view transcript)

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool' -Force | Select-Object Mode,Name,Length └ Mode Name Length … +40 lines (ctrl + t to view transcript) -a--- VERSION 10 - 是否存在已经删除但仍被引用的旧文件名。

• Ran Get-Content -Raw 'E:\My Project\Token Pool\project.json' └ { "$schema": "./templates/project-contract/project.schema.json", … +157 lines (ctrl + t to view transcript) }

• Running Get-Content -Raw 'E:\My Project\Token Pool\package.json'

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\src' -Recurse -Depth 2 | Select-Object FullName

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\frontend' -Recurse -Depth 2 | Select-Object FullName

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\cli' -Recurse -Depth 2 | Select-Object FullName └ FullName … +5 lines (ctrl + t to view transcript)

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\src' -Recurse -Depth 2 | Select-Object FullName └ FullName … +179 lines (ctrl + t to view transcript) E:\My Project\Token Pool\src\utils\token-utils.js

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\frontend' -Recurse -Depth 2 | Select-Object FullName └ FullName … +10 lines (ctrl + t to view transcript) E:\My Project\Token Pool\frontend\styles\app.css

• Ran Get-Content -Raw 'E:\My Project\Token Pool\package.json' └ { "name": "token-pool", … +40 lines (ctrl + t to view transcript) }

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\configs' -Recurse -Depth 2 | Select-Object FullName

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\scripts' -Recurse -Depth 2 | Select-Object FullName

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\deploy' -Recurse -Depth 2 | Select-Object FullName

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\tests' -Recurse -Depth 2 | Select-Object FullName

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\scripts' -Recurse -Depth 2 | Select-Object FullName └ FullName … +4 lines (ctrl + t to view transcript) E:\My Project\Token Pool\scripts\verify-runtime-sync.ps1

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\docs' -Recurse -Depth 2 | Select-Object FullName └ (no output)

• Ran @' │ const fs = require('fs'); │ const path = require('path'); │ … +9 lines └ Y .babelrc Y .dockerignore … +33 lines (ctrl + t to view transcript) Y tls-sidecar Y VERSION

• Running Get-ChildItem -Path 'E:\My Project\Token Pool\llm-integration' -Recurse -Depth 3 | Select-Object FullName ↓

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool\static' -Recurse -Depth 2 | Select-Object FullName └ FullName … +70 lines (ctrl + t to view transcript) E:\My Project\Token Pool\static\components\sidebar.html

• Ran Get-ChildItem -Path 'E:\My Project\Token Pool.github\workflows' | Select-Object Name └ │ Project\Token Pool\configs\codex' └ False True True

• Running Get-Content -Raw 'E:\My Project\Token Pool\install.sh' │ Project\Token Pool\README.md' -Pattern $_ -SimpleMatch | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() } } └ --- api-potluck-data.json 1134:│ ├── api-potluck-data.json.example … +7 lines (ctrl + t to view transcript) --- llm-integration --- frontend/.codex_patch_pass2.py

• Ran Get-Content -TotalCount 40 'E:\My Project\Token Pool\llm-integration\README.md' └ <!-- [[[cog # README.md is generated from docs/index.md using sphinx_markdown_builder … +36 lines (ctrl + t to view transcript) - Run prompts from the command-line - Store prompts and responses in SQLite

• Ran @' │ const fs = require('fs'); │ const path = require('path'); │ … +18 lines └ Top-level names absent from README: [ 'llm-integration' ] Has src/workflows dir? false … +3 lines (ctrl + t to view transcript) src/handlers/request-handler.js.ui-separate-20260423-142649 true static/login.html.ui-separate-20260423-142649 true

• Updated Plan └ Located the canonical README and compared it against current project metadata, directory structure, install scripts, and supporting docs. ✔ Locate the TokenPool repository and identify the canonical README plus project metadata/code surface to compare against it □ Inspect the README against current implementation, config, and docs to find missing, outdated, or inaccurate information □ Report findings with file references, note assumptions/gaps, and give a verification-based verdict on README completeness

• Updated Plan └ Finished the README-vs-repo comparison and assembled the concrete mismatches and omissions. ✔ Locate the TokenPool repository and identify the canonical README plus project metadata/code surface to compare against it ✔ Inspect the README against current implementation, config, and docs to find missing, outdated, or inaccurate information □ Report findings with file references, note assumptions/gaps, and give a verification-based verdict on README completeness

─ Worked for 7m 15s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• [STEP] 审查 Token Pool 的 canonical README 覆盖度与当前仓库一致性 [WHY] 你要确认 README.md 是否已经完善、是否包含项目目前的全部信息 [ACTION] 我对比了 E:\My Project\Token Pool\README.md、E:\My Project\Token Pool\project.json、仓库当前目录结构,以及安装脚本/CLI 文 档等实际文件 [RESULT] 结论是:README.md 很详细,但还不能算“已完善且覆盖当前全部信息”;有几处明确失真和遗漏

  • 高:README.md 明确声称“只列真实存在项;备份、临时副本和残余噪音已移到下方单独块”,但仓库里仍有不少备份/临时文件留在正式目录中,说 明结构描述已经失真。见 README.md:1125、README.md:1290;实际仍存在 src/converters/strategies/OpenAIConverter.js.bak-20260424- 073810、src/providers/openai/openai-core.js.pre-soxio-unary-20260423-145854、src/handlers/request-handler.js.ui-separate- 20260423-142649、static/login.html.ui-separate-20260423-142649、frontend/.codex_patch_pass2.py
  • 高:README 自己要求“文档必须覆盖工程项目中的所有正式文件类别”“本仓所有正式文件都必须在本文档中有对应定位或说明入口”,但当前至少漏 掉了一个顶层实际子项目 llm-integration/,也没有交代它是 vendored/nested repo 还是运行依赖。见 README.md:2567、README.md:2614,对 照实际目录 llm-integration\README.md、llm-integration\pyproject.toml
  • 中:配置层覆盖不完整。README 的 configs/ 树只列了部分文件,漏掉了当前真实存在且明显有职责的 configs/api-potluck-data.json、 configs/api-potluck-keys.json 以及整棵 configs/codex/ 子目录;这和它的“覆盖正式文件类别”目标不一致。对照 README.md:1132、 README.md:2576 与实际 configs/api-potluck-data.json、configs/api-potluck-keys.json、configs/codex/codex-default.json install-and-run.ps1:5、install-and-run.ps1:65、install-and-run.ps1:67
  • 中:README 的“安装脚本会自动测试 CLI 工具和服务器连接”这部分没有提醒老脚本里的健康检查/进程匹配仍带旧路径假设:install.sh 还在匹 配 server.js,检查 /api/health,并输出 /openai/v1 的本地入口,和当前 runtime truth 存在混搭。见 README.md:190;对照 install.sh:102、install.sh:106、install.sh:149
  • 结论:如果标准是“作为项目总说明入口,覆盖主要运行真相和核心边界”,它大体合格;如果标准是“已经反映当前仓库全部重要信息且无明显失 真”,现在还不合格
  • 我认为最该补的不是继续加长正文,而是先修 3 类问题:1) 结构树与真实文件系统同步;2) 明确 llm-integration/、configs/codex/、现存备
  • 验证方式:我直接读取并比对了 README.md、project.json、package.json、install.sh、install-and-run.ps1、cli/README.md,并枚举了 src/、frontend/、configs/、deploy/、tests/、llm-integration/、tls-sidecar/ 的当前实际内容
  • 交付状态:not-needed;这次是文档审查,没有改代码也没有部署

如果你愿意,我下一步可以直接帮你做其中一个:

  1. 给你一份“README 缺失/过时项补丁清单”
  2. 直接把 README.md 修到和当前仓库一致
  3. 连同 install-and-run.ps1 / install.sh 一起收敛成当前真相
□ Inspect current README sections and install scripts in detail, plus git status, to scope exact edits safely
□ Update README.md to match the real repo structure, current file ownership, and install/runtime truth
□ Update install-and-run.ps1 and install.sh to match current Token Pool naming, port, health paths, and startup behavior
□ Run targeted verification on the edited files and report the resulting truth/alignment status

• Running git -C 'E:\My Project\Token Pool' status --short • Ran Get-Content -Raw 'E:\My Project\Token Pool\install.bat' └ @echo off REM Token Pool 项目安装脚本 (Windows) … +127 lines (ctrl + t to view transcript) pause

• Ran Get-Content -Raw 'E:\My Project\Token Pool\install-and-run.bat' └ @echo off setlocal … +13 lines (ctrl + t to view transcript) endlocal

?? tests/provider-pool-manager-health-contract.unit.test.js

• Running Select-String -Path 'E:\My Project\Token Pool\src\services\api-server.js' -Pattern '/health|/api/health|/openai/health| │ 3301|listen(' | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Running Select-String -Path 'E:\My Project\Token Pool\cli\tpswitch.py' -Pattern '3301|3000|/api/health|/health|/openai/v1| │ default_url|base_url' | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Ran Select-String -Path 'E:\My Project\Token Pool\cli\tpswitch.py' -Pattern '3301|3000|/api/health|/health|/openai/v1|default_url| │ base_url' | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() } └ 16:def init(self, base_url: str = "http://127.0.0.1:3301"): 17:self.base_url = base_url … +3 lines (ctrl + t to view transcript) │ … +431 lines … +7 lines (ctrl + t to view transcript) raise s.error('bad escape %s' % this, len(this)) from None re.PatternError: bad escape \i at position 338 (line 12, column 54)

• Ran @' │ from pathlib import Path │ import re │ … +446 lines └ Traceback (most recent call last): File "", line 62, in ValueError: substring not found

• Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '^### .*工作副本' | ForEach-Object { "{0}:{1}" -f │ $.LineNumber, $.Line } └ 1286:### 工作副本残余(仅用于说明,不纳入目标态)

• Ran @' │ from pathlib import Path • Ran Get-Content 'E:\My Project\Token Pool\README.md' | Select-Object -Index (170..230) 公网客户端 / Codex ↓ index bd2c952..136a075 100644 … +874 lines (ctrl + t to view transcript) +echo " 4. Check console health: http://127.0.0.1:3301/api/health" +echo " 5. OpenAI-compatible URL: http://127.0.0.1:3301/openai/v1"

• Ran Get-Content 'E:\My Project\Token Pool\README.md' | Select-Object -Index (1120..1310) └ -> frontend/src/pages/compatibility/LegacyDocsPage -> docs / help entry … +187 lines (ctrl + t to view transcript) - 控制台内部目标:http://127.0.0.1:3301/

• Ran (Get-Content -Raw 'E:\My Project\Token Pool\README.md').Replace('### 当前工���树残余(存在但不属于 canonical truth)','### 当前 │ 工作树残余(存在但不属于 canonical truth)') | Set-Content -NoNewline 'E:\My Project\Token Pool\README.md' └ (no output)

• Running Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '当前工作树残余|llm-integration|configs/codex|Token Pool │ Quick Start|127.0.0.1:3301/health' | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Running bash -n 'E:\My Project\Token Pool\install.sh'; bash -n 'E:\My Project\Token Pool\install-and-run.sh'

• Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '当前工作树残余|llm-integration|configs/codex|Token Pool │ Quick Start|127.0.0.1:3301/health' | ForEach-Object { "{0}:{1}" -f $.LineNumber, $.Line.Trim() } └ 205:- 本地 runtime 健康检查:http://127.0.0.1:3301/health 952:内部验证:http://127.0.0.1:3301/health 1184:├── llm-integration/ # 内嵌上游子仓;不是 Token Pool runtime 的 canonical 源树 1252:### 当前工作树残余(存在但不属于 canonical truth)

• Ran @' │ $null = $errors = $null ✔ Inspect current README sections and install scripts in detail, plus git status, to scope exact edits safely ✔ Update README.md to match the real repo structure, current file ownership, and install/runtime truth ✔ Update install-and-run.ps1 and install.sh to match current Token Pool naming, port, health paths, and startup behavior □ Run targeted verification on the edited files and report the resulting truth/alignment status

• ...

• Ran git -C 'E:\My Project\Token Pool' diff --stat HEAD -- README.md install-and-run.ps1 install.sh install-and-run.sh

• Ran @' │ const fs = require('fs'); │ const root = 'E:/My Project/Token Pool'; PASS install.sh checks 3301 health

• Updated Plan └ Verified the edits via targeted content checks and PowerShell parsing. Bash syntax verification was attempted but blocked because /bin/bash is unavailable in this environment. ✔ Inspect current README sections and install scripts in detail, plus git status, to scope exact edits safely ✔ Update README.md to match the real repo structure, current file ownership, and install/runtime truth ✔ Update install-and-run.ps1 and install.sh to match current Token Pool naming, port, health paths, and startup behavior ✔ Run targeted verification on the edited files and report the resulting truth/alignment status

• Running Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern 'Canonical 启动入口(Windows / PowerShell)| │ llm-integration|当前工作树残余|provider pool 真值不再维护 repo-local|本地 runtime 健康检查' | ForEach-Object { "README.md: │ {0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Ran Select-String -Path 'E:\My Project\Token Pool\install-and-run.ps1' -Pattern 'Token Pool Quick Start|127.0.0.1:3301/health| │ OpenAI endpoint' | ForEach-Object { "install-and-run.ps1:{0}:{1}" -f $.LineNumber, $.Line.Trim() }; Select-String -Path 'E: │ \My Project\Token Pool\install.sh' -Pattern 'Token Pool bootstrap|3301/health|Bootstrap complete' | ForEach-Object │ … +3 lines └ install-and-run.ps1:5:Write-Host " Token Pool Quick Start" -ForegroundColor Cyan install-and-run.ps1:64:Write-Host "Runtime health: http://127.0.0.1:3301/health" … +6 lines (ctrl + t to view transcript) install-and-run.sh:56:echo "Runtime health: http://127.0.0.1:3301/health" install-and-run.sh:58:echo "OpenAI endpoint: http://127.0.0.1:3301/openai/v1"

• Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern 'Canonical 启动入口(Windows / PowerShell)|llm-integration| │ 当前工作树残余|provider pool 真值不再维护 repo-local|本地 runtime 健康检查' | ForEach-Object { "README.md:{0}:{1}" -f │ $.LineNumber, $.Line.Trim() } └ README.md:177:- Canonical 启动入口(Windows / PowerShell):install-and-run.ps1 README.md:205:- 本地 runtime 健康检查:http://127.0.0.1:3301/health … +4 lines (ctrl + t to view transcript) README.md:1184:├── llm-integration/ # 内嵌上游子仓;不是 Token Pool runtime 的 canonical 源树 README.md:1252:### 当前工作树残余(存在但不属于 canonical truth)

─ Worked for 1m 12s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• [STEP] 把 README.md 和安装脚本收敛到当前 Token Pool 真相 [WHY] 你要求直接修正文档,并把安装入口和当前运行事实对齐 [ACTION] 我更新了 README.md、install-and-run.ps1、install.sh,并顺手把同类 Unix 启动入口 install-and-run.sh 一并收敛,避免 README 再指向过时脚本 [RESULT] 现在文档和启动脚本已经和当前仓库/运行现实基本一致

  • 我修正了 README 的安装入口和本地运行真相:把 canonical 启动入口统一到 install-and-run.ps1 / install-and-run.sh,补上本地 3301、/ health、/api/health、/openai/v1,见 README.md:177、README.md:205 170.106.179.226:/srv/token-pool-gateway-prod/configs/provider_pools.json,见 README.md:400
  • 我把 README 对残余文件的表述改成“存在但不属于 canonical truth”,不再假装这些文件已经从工作树清干净,见 README.md:1252
  • 我把 PowerShell 启动脚本从旧的 AI Client 2 API / localhost:3000 收敛到 Token Pool / 127.0.0.1:3301,见 install-and-run.ps1:5、 install-and-run.ps1:64、install-and-run.ps1:66
  • 我把 install.sh 的本地检查逻辑改成当前真实路径,去掉了旧的 server.js/3000 叙事,改为检查 3301 的 runtime/console health,并保留 bootstrap 职责,见 install.sh:7、install.sh:85、install.sh:94
  • 我同步修了 install-and-run.sh,避免 README 指向的 Unix 入口仍然停留在旧产品名和旧端口,见 install-and-run.sh:7、install-and- run.sh:56、install-and-run.sh:58
  • 验证通过:我做了针对性内容检查,确认 README 已包含 llm-integration、configs/codex、3301 真相,且启动脚本已去掉旧的 AI Client 2 API/3000 文案
  • 验证通过:install-and-run.ps1 已用 PowerShell 语法解析,结果 OK
  • 验证受限:我尝试对 install.sh / install-and-run.sh 做 bash -n 语法检查,但当前环境没有可用的 /bin/bash,所以这一步是 blocked by environment,不是脚本本身报错
  • live verification: not-needed
  1. 顺手把 install.bat 也一起收敛到同一套当前真相
  2. 继续清理 README 里其余仍偏“历史叙事”的段落,压成更短、更像 current-state 手册的版本
  3. 帮你再扫一遍 project.json 和 README 有没有残留双真值冲突

› 1. 顺手把 install.bat 也一起收敛到同一套当前真相 2. 继续清理 README 里其余仍偏“历史叙事”的段落,压成更短、更像 current-state 手册的版本 3. 帮你再扫一遍 project.json 和 README 有没有残留双真值冲突

□ Update `install.bat` to match current Token Pool naming, port, health paths, and bootstrap/start guidance
□ Trim and clean README sections that are still overly historical while preserving current-state operational truth
□ Cross-check `project.json` against the updated README, fix any remaining conflicts, and verify the edited files

• Running Get-Content -Raw 'E:\My Project\Token Pool\install.bat'

• Running Get-Content -Raw 'E:\My Project\Token Pool\project.json'

• Ran Get-Content -Raw 'E:\My Project\Token Pool\project.json' └ { "$schema": "./templates/project-contract/project.schema.json", … +157 lines (ctrl + t to view transcript) }

• Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '历史|已废弃|阶段|草案|兼容|legacy|AIClient2API|aiclient2api| • Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '历史|已废弃|阶段|草案|兼容|legacy|AIClient2API|aiclient2api| 28:consoleLegacyUrl: https://pool-console.tengokukk.com/legacy/ … +145 lines (ctrl + t to view transcript) 2598:- 规则维护:唯一真源规则、文档覆盖强制规则、阶段演进规范。 2618:- 阶段、附录、架构边界是否仍无冲突。

… +26 lines (ctrl + t to view transcript)
PASS    frontendDesignDoc    frontend/CONSOLE-REDESIGN.md
PASS    consoleDesignLanguage    Signal Room / 硬派主控台

• Ran @' │ from pathlib import Path │ import json │ … +181 lines └ (no output)

• Running Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '文档策略|如果需要运行拓扑|provider_pools.json|12.2 使用 │ 方式|唯一主说明文件|当前运行真相|变更边界' | ForEach-Object { "README.md:{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Running Select-String -Path 'E:\My Project\Token Pool\install.bat' -Pattern 'Token Pool bootstrap|3301/health|install-and-run.bat| │ OpenAI-compatible URL' | ForEach-Object { "install.bat:{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Ran Get-Content -Raw 'E:\My Project\Token Pool\project.json' └ { "$schema": "./templates/project-contract/project.schema.json", … +157 lines (ctrl + t to view transcript) }

README.md:2595:- 正文维护:项目概述、架构边界、验收标准、术语、对外简介与当前运行真相。
README.md:2598:- 规则维护:唯一真源规则、文档覆盖强制规则与变更边界。

• Ran Get-Content 'E:\My Project\Token Pool\README.md' | Select-Object -Index (40..55); (Get-Content -Raw 'E:\My Project\Token │ Pool\README.md').Replace('- provider 问题优先查 provider_pools.json、健康状态与路由决策。','- provider 问题优先查 │ 170.106.179.226:/srv/token-pool-gateway-prod/configs/provider_pools.json、健康状态与路由决策。') | Set-Content 'E:\My │ … +1 lines └ deploymentMode: github-actions-self-hosted-local-deploy deploymentRunner: token-pool-prod-170 … +13 lines (ctrl + t to view transcript)

### 0.2 快速开始

└ (no output)

• Running Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '如果需要运行拓扑|provider 问题优先查 170.106.179.226:/ │ srv/token-pool-gateway-prod/configs/provider_pools.json|12.2 使用方式|README.md`:唯一主说明文件' | ForEach-Object │ { "README.md:{0}:{1}" -f $.LineNumber, $.Line.Trim() }

• Running @' │ const fs = require('fs'); │ const root = 'E:/My Project/Token Pool'; │ … +19 lines

• Ran Select-String -Path 'E:\My Project\Token Pool\README.md' -Pattern '如果需要运行拓扑|provider 问题优先查 170.106.179.226:/srv/ │ token-pool-gateway-prod/configs/provider_pools.json|12.2 使用方式|README.md:唯一主说明文件' | ForEach-Object { "README.md: │ {0}:{1}" -f $_.LineNumber, $_.Line.Trim() } └ README.md:47:- 如果需要运行拓扑、结构分层、部署路径与验证顺序,继续看后续正文。 README.md:2452:### 12.2 使用方式 README.md:2528:- README.md`:唯一主说明文件,承载项目结构、当前态边界、部署路径与运行边界

• Ran git -C 'E:\My Project\Token Pool' diff --stat HEAD -- README.md project.json install.bat └ README.md | 318 +++++++++++++++++++++++++++-------------------------------- install.bat | 136 +++++++++++-------------- project.json | 61 +++++++++++- 3 files changed, 268 insertions(+), 247 deletions(-)

• Ran @' │ const fs = require('fs'); │ const root = 'E:/My Project/Token Pool'; │ … +19 lines └ PASS README references project install entry windows PASS README references project install entry batch … +6 lines (ctrl + t to view transcript) PASS install.bat points to 3301 health PASS install.bat points to install-and-run entry

• Ran git -C 'E:\My Project\Token Pool' diff --stat HEAD -- README.md project.json install.bat └ README.md | 318 +++++++++++++++++++++++++++-------------------------------- install.bat | 136 +++++++++++-------------- project.json | 61 +++++++++++- 3 files changed, 268 insertions(+), 247 deletions(-)

• Updated Plan └ Updated install.bat, trimmed more historical README wording toward current-state guidance, and aligned project.json terminology with the README. ✔ Inspect install.bat, project.json, and the remaining README sections to identify outdated wording and metadata mismatches ✔ Update install.bat to match current Token Pool naming, port, health paths, and bootstrap/start guidance ✔ Trim and clean README sections that are still overly historical while preserving current-state operational truth ✔ Cross-check project.json against the updated README, fix any remaining conflicts, and verify the edited files

─ Worked for 2m 49s ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• [STEP] 收敛 install.bat、继续瘦身 README.md、复扫 project.json/README 真值一致性 [WHY] 你要把 Windows 批处理入口也拉回当前真相,并继续把 README 压成更偏 current-state 的手册 [ACTION] 我更新了 install.bat,继续清理了 README.md 的历史/阶段叙事表述,并把 project.json 里残留的乱码术语说明同步成和 README 一致 的当前语义 [RESULT] 这三项已经完成,当前没发现新的 project.json / README.md 双真值冲突 把下一步引导到 install-and-run.bat / install-and-run.ps1,见 install.bat:6、install.bat:91、install.bat:106

  • README.md 的开头策略现在更明确偏 current-state,而不是“先讲阶段规划再讲真相”,见 README.md:10
  • README 的说明入口也收紧成“运行拓扑 / 结构分层 / 部署路径 / 验证顺序”,减少“阶段规划”口吻,见 README.md:47
  • 我把 README 里一个残留的旧表述也收掉了:provider 排障优先查看的已明确成远端运行态 provider_pools.json,不再写成模糊的 repo-local 名称,见 README.md:480
  • 我把路线图段落进一步降级成“参考用,不覆盖 current-state truth”,让它更像运行手册而不是项目历史叙事,见 README.md:2452
  • project.json 里的 termResolutionPolicy 现在不再有乱码,且和 README 的 key 术语解释保持一致,比如“池子里的 key”“本机 Codex key”“把 key 加进池子”等,见 project.json
  • 一致性复扫结果:README.md 已覆盖 project.json 的安装入口、provider pool 真值路径、console 设计文档、console 设计语言等关键字段; 这轮检查没有发现新的 README/project.json 冲突
  • 验证通过:我做了针对性内容校验,确认 README 包含 project.json.installEntry、project.json.providerPoolSourceOfTruth.runtimePath、 project.json.frontendDesignDoc、project.json.consoleDesignLanguage
  • 验证通过:install.bat 已确认不再残留 3000 / localhost:3000 旧真相,并已指向 3301 和当前启动入口
  • 验证说明:Windows .bat 没有像 PowerShell 那样现成的独立语法解析器;这轮我用内容一致性校验代替 dry-run,避免真的触发安装流程
  • delivery state: not-needed
  • deploy decision: not-needed
  • live verification: not-needed

如果你要,我下一步最自然的是:

  1. 把 README 里更大块的“阶段/迁移/兼容专区”重叙事再做一轮硬删减,压成更短的运维手册版
  2. 继续扫 install-and-run.bat / cli/README.md / install.bat 三者之间还有没有说明不一致
  3. 直接帮你把这轮文档与脚本改动整理成一组可提交的最小 diff

› 你现在问的这个,其实已经不是“怎么用 AI”,而是:

 怎么做一个真正工业级的 Token Pool(成本 + 性能 + 稳定性控制系统)

你列的这些指标是对的,但现在缺的是:

❗“这些指标如何驱动调度决策”

我直接给你一版能落地的优化方案(算法级 + 架构级)


一、你现在的问题本质

你现在的池子如果出现:

  • token 消耗变多
  • 浪费几倍
  • 体验不稳定

通常是因为:

1️⃣ 没有缓存策略(或命中低)
2️⃣ 没有动态调度(随机/轮询)
3️⃣ 没有质量反馈(失败还在用)
4️⃣ 没有成本权重(贵模型被滥用)

二、核心思想(先记住这句话)

 Token Pool 本质 = 一个“带成本约束的调度系统”

不是:

❌ 账号列表

而是:

✅ 实时评分 + 路由决策引擎

三、核心指标 → 转换成“调度权重”

你现在有这些字段:

  • 请求数
  • 缓存命中
  • 首 Token
  • 总耗时
  • 成功率(隐含)
  • 最近表现

要转成一个东西:

 score(评分)

✨ 推荐评分公式(可直接用)

function computeScore(provider) {
  const successRate = provider.success / provider.requests
 
  const latencyScore =
    1 / (provider.avgFirstTokenTime + 1)
 
  const speedScore =
    1 / (provider.avgTotalTime + 1)
 
  const cacheScore = provider.cacheHitRate
 
  const stabilityPenalty =
    provider.recentFailures > 3 ? 0.5 : 1
 
  const costWeight =
    1 / (provider.costPer1kTokens || 1)
 
  return (
    successRate * 0.3 +
    latencyScore * 0.2 +
    speedScore * 0.2 +
    cacheScore * 0.1 +
    costWeight * 0.2
  ) * stabilityPenalty
}

解释一下(重点):

因素 作用
successRate 不稳定直接降权
latencyScore 首 token 快优先
speedScore 长文本效率
cacheScore 节省 token
costWeight 控制成本
stabilityPenalty 防止抖动

四、调度算法(核心)

❌ 不要再用:

随机 / 轮询

✅ 用:Weighted Routing(带探索)

function pickProvider(pool) {
  // 10% 探索
  if (Math.random() < 0.1) {
    return random(pool)
  }
 
  // 90% 利用
  return weightedRandom(pool, p => p.score)
}

这是:

 Multi-Armed Bandit(简化版)

五、缓存优化(这是你最大省钱点)

你现在的缓存命中:

 是“结果”,不是“策略”

✨ 三层缓存(必须上)

L1: 精确匹配缓存(hash(prompt))
L2: 语义缓存(embedding 相似)
L3: 模板缓存(system + few-shot)

示例

function getCachedResponse(prompt) {
  const exact = cache.get(hash(prompt))
  if (exact) return exact
 
  const similar = semanticCache.search(prompt)
  if (similar.score > 0.92) return similar.result
 
  return null
}

提升点:

命中率从 5% → 30%+(非常常见)

六、Token 节流(你现在肯定没有)

❌ 问题

同一账号:

短时间高并发 → 限流 / 封号 / 失败

✅ 方案:Token Budget

if (provider.tokensUsedLastMinute > limit) {
  provider.cooldown = true
}

再加:

滑动窗口(1min / 5min)

七、失败重试(很多人写错)

❌ 错误

失败 → 直接 retry 同一个 provider

✅ 正确

function retry(request, tried = []) {
  const next = pickProvider(pool.filter(p => !tried.includes(p)))
 
  return call(next)
}

再加:

指数退避(backoff)

八、稳定性优化(关键)

✨ Provider 状态机

healthy → degraded → unhealthy → cooldown

示例

if (failureRate > 0.5) {
  provider.state = "unhealthy"
}
 
if (latencySpike) {
  provider.state = "degraded"
}

调度时:

filter(p => p.state === "healthy")

九、最关键优化(很多人忽略)

请求分级


分类:

A: 快速响应(chat)
B: 长文本(写作)
C: 低价值(测试)

路由:

if (task.type === "chat") {
  prefer(lowLatencyProviders)
}
 
if (task.type === "long") {
  prefer(highThroughputProviders)
}

这一步可以:

节省 30%+ 成本

十、最终架构(你应该长这样)

Request

Cache Layer(L1/L2/L3)

Router(score + bandit)

Provider Pool(状态机)

Executor

Metrics Collector

Score Update

十一、你现在缺的关键组件

我帮你点出来:

❌ score系统(你现在只是记录数据)
❌ 动态路由(你可能是轮询)
❌ 语义缓存
❌ provider状态机
❌ token预算控制
❌ 请求分级

十二、如果你只做 3 件事(优先级)

最值钱的三个:

1️⃣ 加 score + weighted routing
2️⃣ 上语义缓存(embedding)
3️⃣ 做 provider 状态机

最后一段(很重要)

你现在的系统已经在往:

 “AI 调度系统”

走了

再往前一步就是:

 inference gateway(类似 OpenAI 内部那套)

如果你要,我可以下一步直接帮你做一版:

 Token Pool V2(完整代码结构)
- score engine
- router
- cache(含 embedding)
- provider state machine
- metrics pipeline

而且是能直接接你现在工程的。 好,下面我直接给你一版 Token Pool V2(可落地工程结构 + 核心代码骨架),你可以直接塞进现在的项目里跑,然后再逐步替换你现有的调 度层。

我会按你现在的工程风格来:模块化 / 低耦合 / 可替换


一、整体架构(V2 版)

token-pool/
  ├─ core/
  │   ├─ router.js              # 调度入口(唯一入口)
  │   ├─ score-engine.js        # 评分系统
  │   ├─ state-manager.js       # provider 状态机
  │   ├─ budget-manager.js      # token 限流

  ├─ cache/
  │   ├─ l1-exact-cache.js
  │   ├─ l2-semantic-cache.js

  ├─ provider/
  │   ├─ provider-pool.js
  │   ├─ provider-client.js

  ├─ metrics/
  │   ├─ collector.js
  │   ├─ aggregator.js

  ├─ scheduler/
  │   ├─ bandit.js              # weighted + 探索

  └─ index.js                  # 对外入口

二、核心入口(Router)

// core/router.js
 
const { pickProvider } = require("../scheduler/bandit")
const { computeScore } = require("./score-engine")
const { getProviderPool } = require("../provider/provider-pool")
const { checkBudget } = require("./budget-manager")
const { getCached, setCache } = require("../cache/l1-exact-cache")
const { semanticSearch } = require("../cache/l2-semantic-cache")
const { collectMetrics } = require("../metrics/collector")
 
async function routeRequest(req) {
  // 1️⃣ L1 cache
  const cached = getCached(req)
  if (cached) return cached
 
  // 2️⃣ L2 semantic cache
  const similar = await semanticSearch(req)
  if (similar) return similar
 
  // 3️⃣ provider pool
  const pool = getProviderPool()
 
  // 4️⃣ compute score
  pool.forEach(p => {
    p.score = computeScore(p)
  })
 
  // 5️⃣ filter unhealthy + budget
  const candidates = pool.filter(p =>
    p.state === "healthy" &&
    checkBudget(p)
  )
 
  if (!candidates.length) {
    throw new Error("NO_AVAILABLE_PROVIDER")
  }
 
  // 6️⃣ pick provider
  const provider = pickProvider(candidates)
 
  const start = Date.now()
 
  try {
    const res = await provider.call(req)
 
    const latency = Date.now() - start
 
    // 7️⃣ metrics
    collectMetrics(provider, {
      success: true,
      latency,
      tokens: res.usage?.total_tokens || 0
    })
 
    // 8️⃣ cache
    setCache(req, res)
 
    return res
 
  } catch (err) {
    collectMetrics(provider, { success: false })
 
    provider.failures++
 
    // fallback retry
    return retry(req, provider)
  }
}
 
async function retry(req, failedProvider) {
  const pool = getProviderPool().filter(p => p !== failedProvider)
 
  if (!pool.length) throw new Error("ALL_PROVIDER_FAILED")
 
  const next = pickProvider(pool)
 
  return next.call(req)
}
 
module.exports = { routeRequest }

三、Score Engine(核心)

// core/score-engine.js
 
function computeScore(p) {
  const successRate = p.success / (p.requests || 1)
 
  const latencyScore = 1 / (p.avgFirstTokenTime + 1)
  const speedScore = 1 / (p.avgTotalTime + 1)
 
  const cacheScore = p.cacheHitRate || 0
 
  const costWeight = 1 / (p.costPer1kTokens || 1)
 
  let stability = 1
  if (p.failures > 5) stability *= 0.5
  if (p.state === "degraded") stability *= 0.7
 
  return (
    successRate * 0.3 +
    latencyScore * 0.2 +
    speedScore * 0.2 +
    cacheScore * 0.1 +
    costWeight * 0.2
  ) * stability
}
 
module.exports = { computeScore }

四、Bandit 调度(重点)

// scheduler/bandit.js
 
function pickProvider(pool) {
  // 10% 探索
  if (Math.random() < 0.1) {
    return pool[Math.floor(Math.random() * pool.length)]
  }
 
  // weighted random
  const total = pool.reduce((sum, p) => sum + p.score, 0)
 
  let rand = Math.random() * total
 
  for (const p of pool) {
    rand -= p.score
    if (rand <= 0) return p
  }
 
  return pool[0]
}
 
module.exports = { pickProvider }

五、Provider 状态机

// core/state-manager.js
 
function updateState(p) {
  const failureRate = p.failures / (p.requests || 1)
 
  if (failureRate > 0.5) {
    p.state = "unhealthy"
  } else if (failureRate > 0.2) {
    p.state = "degraded"
  } else {
    p.state = "healthy"
  }
 
  // cooldown 恢复
  if (p.state === "unhealthy" && Date.now() - p.lastFail > 60000) {
    p.state = "degraded"
  }
}
 
module.exports = { updateState }

六、Token Budget(限流)

// core/budget-manager.js
 
const LIMIT_PER_MIN = 100000
 
function checkBudget(p) {
  const now = Date.now()
 
  // 滑动窗口
  p.usage = p.usage.filter(t => now - t < 60000)
 
  const used = p.usage.reduce((sum, t) => sum + t, 0)
 
  return used < LIMIT_PER_MIN
}
 
function recordUsage(p, tokens) {
  p.usage.push(tokens)
}
 
module.exports = { checkBudget, recordUsage }

七、缓存(简化版)

L1 精确缓存

// cache/l1-exact-cache.js
 
const cache = new Map()
 
function hash(req) {
  return JSON.stringify(req)
}
 
function getCached(req) {
  return cache.get(hash(req))
}
 
function setCache(req, res) {
  cache.set(hash(req), res)
}
 
module.exports = { getCached, setCache }

L2 语义缓存(接口占位)

// cache/l2-semantic-cache.js
 
async function semanticSearch(req) {
  // TODO: 接 embedding
  return null
}
 
module.exports = { semanticSearch }

八、Metrics 收集

// metrics/collector.js
 
function collectMetrics(p, data) {
  p.requests++
 
  if (data.success) {
    p.success++
  }
 
  if (data.latency) {
    p.avgTotalTime =
      (p.avgTotalTime * 0.9) + (data.latency * 0.1)
  }
 
  if (data.tokens) {
    p.usage.push(data.tokens)
  }
}
 
module.exports = { collectMetrics }

九、你会立刻看到的效果

如果你接入这套:


成本

↓ 20% ~ 60%(取决于缓存)

稳定性

失败自动绕开坏节点

速度

首 token 自动偏向快节点

利用率

不再浪费“好账号”

十、下一步(你该做什么)

按优先级:

1️⃣ 把 router 替换你当前入口
2️⃣ 接 metrics(哪怕先用内存)
3️⃣ 加 provider 状态机
4️⃣ 最后再接 semantic cache

最后一句(重点)

你现在这套已经不是:

“账号池”

而是:

 一个 inference gateway(推理调度层)

如果你要,我下一步可以帮你再上一个更狠的:

 Cost-Aware Routing(带预算 + 任务分级 + 动态模型选择)

那就是接近真正生产级 AI 网关了。 这是建议,将其吸收进readme里,并开始落实

• Updated Plan └ Absorbing the proposed industrial Token Pool V2 ideas into the canonical README and starting a grounded implementation in the existing routing/runtime stack. □ Inspect the current routing, provider pool, observability, and budget-related code to find the real integration points for score routing, provider state, and request classification □ Update README.md to fold in the industrial-grade routing/cost-control strategy as current design guidance without creating parallel truth □ Implement the first grounded V2 pieces in the existing codebase: provider scoring/state inputs, request classification, and weighted routing hooks □ Run targeted tests or smoke verification for the touched routing/runtime paths and report what landed vs what remains

• Running Get-Content -Raw 'E:\My Project\Token Pool\src\core\routing-runner.js'

• Running Get-Content -Raw 'E:\My Project\Token Pool\src\providers\provider-selection.js'

• Running Get-Content -Raw 'E:\My Project\Token Pool\src\providers\provider-score.js'

• Running Get-Content -Raw 'E:\My Project\Token Pool\src\providers\provider-pool-manager.js'

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\core\routing-runner.js' └ import logger from '../utils/logger.js'; import { normalizeError } from './errors.js'; … +350 lines (ctrl + t to view transcript) }

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\providers\provider-pool-manager.js' import { getServiceAdapter, getRegisteredProviders, invalidateServiceAdapter } from './adapter.js'; … +2519 lines (ctrl + t to view transcript) }

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\providers\provider-selection.js' └ export function normalizeExcludedProviderUuids(value) { if (value instanceof Set) { … +24 lines (ctrl + t to view transcript) }

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\observability\debug-store.js' └ const DEBUG_RECENT_REQUEST_LIMIT = 50;

… +445 lines (ctrl + t to view transcript)
}

• Running Get-Content -Raw 'E:\My Project\Token Pool\src\observability\request-context.js'

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\core\routing-policy.js' └ import fs from 'fs'; import path from 'path'; … +233 lines (ctrl + t to view transcript) } • Ran Get-Content -Raw 'E:\My Project\Token Pool\src\utils\request-attempt-budget.js' └ function toPositiveInteger(value, fallback = 0) { const numeric = Number.parseInt(value, 10); … +237 lines (ctrl + t to view transcript) E:\My Project\Token Pool\src\observability\request-context.js:39: budgetPressure: 0, … +54 lines (ctrl + t to view transcript) E:\My Project\Token Pool\src\handlers\request-handler.js:126: selectedProviderType: requestContext.selectedProviderType || null,

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\observability\request-context.js' └ import { randomUUID } from 'crypto';

… +81 lines (ctrl + t to view transcript)
}

• Running Get-Content 'E:\My Project\Token Pool\src\providers\provider-pool-manager.js' | Select-Object -Index (540..650) └ import deepmerge from 'deepmerge'; import logger from '../utils/logger.js'; └ _doSelectProvider(providerType, requestedModel, options) { const availableProviders = this.providerStatus[providerType] || []; … +92 lines (ctrl + t to view transcript) } if (!p.config.notSupportedModels || !Array.isArray(p.config.notSupportedModels)) {

• Ran Get-Content 'E:\My Project\Token Pool\src\providers\provider-pool-manager.js' | Select-Object -Index (540..650) │ _calculateNodeScore|routeClass" 'E:\My Project\Token Pool\src\services\service-manager.js' 'E:\My Project\Token │ Pool\src\providers\provider-pool-manager.js' └ E:\My Project\Token Pool\src\services\service-manager.js:425: const selectedProviderConfig = await providerPoolManager.selectProvider(config.MODEL_PROVIDER, actualModelName, { ...options, skipUsageCount: true }); import logger from '../utils/logger.js'; … +656 lines (ctrl + t to view transcript) }

• Ran Get-Content -Raw 'E:\My Project\Token Pool\src\observability\provider-debug.js' └ import { getConfiguredSupportedModels, normalizeModelIds } from '../providers/provider-models.js';

… +91 lines (ctrl + t to view transcript)
}

• Running Get-Content -Raw 'E:\My Project\Token Pool\tests\provider-score.unit.test.js'

• Running Get-Content -Raw 'E:\My Project\Token Pool\tests\request-handler-provider-debug.unit.test.js'

• Ran Get-Content -Raw 'E:\My Project\Token Pool\tests\debug-store-costs.unit.test.js' └ import { buildCacheScopesSnapshot, … +325 lines (ctrl + t to view transcript)

• Ran rg -n "handleContentGenerationRequest|resolveRoutingPolicy|requestContext|candidateProviders|budgetPressure|promptCacheScope| │ endpointType" 'E:\My Project\Token Pool\src\utils\common.js' └ 15:import { resolveRoutingPolicy, summarizeRoutingPolicy } from '../core/routing-policy.js'; 527: endpointType, … +57 lines (ctrl + t to view transcript) 1956: requestContext: getRequestContext(error) 2057: requestContext: getRequestContext(error)

• Ran Get-Content 'E:\My Project\Token Pool\src\utils\common.js' | Select-Object -Index (1160..1475) └ * @param {Object} CONFIG - The server configuration object. * @param {string} PROMPT_LOG_FILENAME - The prompt log filename. … +313 lines (ctrl + t to view transcript) try { const pluginManager = getPluginManager();

• Ran Get-Content 'E:\My Project\Token Pool\src\providers\provider-pool-manager.js' | Select-Object -Index (940..1048) └ throw error; } … +105 lines (ctrl + t to view transcript)

        // 始终更新 lastUsed(确保 LRU 策略生效,避免并发请求选到同一个 provider)

• Edited E:\My Project\Token Pool\src\observability\request-context.js (+3 -0) 27 policyId, 28 + requestClass: null, 29 + requestPriority: null, 30 + requestProfile: null, 31 disableProviderRetries: true,

• Added E:\My Project\Token Pool\src\core\request-classifier.js (+93 -0) 1 +import { estimateRequestBudgetTokens } from '../utils/request-attempt-budget.js'; 2 + 3 +function toFiniteNumber(value, fallback = 0) { 4 + const numeric = Number(value); 5 + return Number.isFinite(numeric) ? numeric : fallback; 6 +} 7 + 8 +function getExplicitPriority(requestBody = {}) { 9 + const priority = requestBody?.priority ?? requestBody?.metadata?.priority ?? null; 10 + if (priority == null) return null; 11 + return String(priority).trim().toLowerCase() || null; 12 +} 13 + 14 +function classifyByBudget(totalTokens) { 15 + if (totalTokens >= 24000) { 16 + return 'long'; 17 + } 18 + if (totalTokens <= 6000) { 19 + return 'chat'; 20 + } 21 + return 'balanced'; 22 +} 23 + 24 +function buildProfile(requestClass, priority, estimatedTokens) { 25 + const baseProfiles = { 26 + chat: { 27 + latencyWeight: 1, 28 + throughputWeight: 0.45, 29 + costWeight: 0.55, 30 + stabilityWeight: 0.9 31 + }, 32 + long: { 33 + latencyWeight: 0.35, 34 + throughputWeight: 1, 35 + costWeight: 0.75, 36 + stabilityWeight: 1 37 + }, 38 + background: { 39 + latencyWeight: 0.15, 40 + throughputWeight: 0.55, 41 + costWeight: 1, 42 + stabilityWeight: 0.8 43 + }, 44 + balanced: { 45 + latencyWeight: 0.65, 46 + throughputWeight: 0.65, 47 + costWeight: 0.65, 48 + stabilityWeight: 0.9 49 + } 50 + }; 51 + 52 + const profile = { 53 + ...(baseProfiles[requestClass] || baseProfiles.balanced), 54 + requestClass, 55 + priority, 56 + estimatedTokens: toFiniteNumber(estimatedTokens, 0) 57 + }; 58 + 59 + if (priority === 'low' || requestClass === 'background') { 60 + profile.costWeight = Math.min(1.25, profile.costWeight + 0.25); 61 + profile.latencyWeight = Math.max(0.05, profile.latencyWeight - 0.2); 62 + } 63 + 64 + if (priority === 'high') { 65 + profile.latencyWeight = Math.min(1.2, profile.latencyWeight + 0.2); 66 + profile.stabilityWeight = Math.min(1.2, profile.stabilityWeight + 0.15); 67 + } 68 + 69 + return profile; 70 +} 71 + 72 +export function classifyRequestProfile(requestBody = {}, endpointType = null) { 73 + const estimate = estimateRequestBudgetTokens(requestBody); 74 + const explicitPriority = getExplicitPriority(requestBody); 75 + 76 + let requestClass = classifyByBudget(estimate.totalTokens); 77 + 78 + if (endpointType === 'OPENAI_CHAT') { 79 + requestClass = 'chat'; 80 + } else if (endpointType === 'GEMINI_CONTENT' && estimate.totalTokens >= 16000) { 81 + requestClass = 'long'; 82 + } 83 + 84 + const priority = explicitPriority || (requestClass === 'background' ? 'low' : 'normal'); 85 + return { 86 + requestClass, 87 + requestPriority: priority, 88 + estimatedTokens: estimate.totalTokens, 89 + estimatedPromptTokens: estimate.inputTokens, 90 + estimatedOutputTokens: estimate.outputTokens, 91 + profile: buildProfile(requestClass, priority, estimate.totalTokens) 92 + }; 93 +}

• Edited E:\My Project\Token Pool\src\providers\provider-score.js (+133 -20) 12 13 +function clamp(value, min, max) { 14 + return Math.min(max, Math.max(min, value)); 15 +} 16 + 17 +export function deriveProviderRuntimeState(providerStatus, now = Date.now()) { 18 + const config = providerStatus?.config || {}; 19 + const state = providerStatus?.state || {}; 20 + const successCount = toFiniteNumber(config.successCount, 0); 21 + const failCount = toFiniteNumber(config.failCount, toFiniteNumber(config.errorCount, 0)); 22 + const totalAttempts = successCount + failCount; 23 + const successRate = totalAttempts > 0 24 + ? getPenaltyRatio(successCount, totalAttempts, 1) 25 + : Math.min(1, Math.max(0, toFiniteNumber(config.successRate, 1))); 26 + const recoveryTime = config.scheduledRecoveryTime ? new Date(config.scheduledRecoveryTime).getTime() : 0; 27 + const concurrencyLimit = parseInt(config.concurrencyLimit || 0, 10); 28 + const activeCount = toFiniteNumber(state.activeCount, 0); 29 + const waitingCount = toFiniteNumber(state.waitingCount, 0); 30 + 31 + if (config.isDisabled) return 'disabled'; 32 + if (!config.isHealthy) { 33 + if (Number.isFinite(recoveryTime) && recoveryTime > now) { 34 + return 'cooldown'; 35 + } 36 + return 'unhealthy'; 37 + } 38 + 39 + if (config.needsRefresh) return 'degraded'; 40 + if (totalAttempts >= 5 && successRate < 0.65) return 'degraded'; 41 + if (concurrencyLimit > 0 && activeCount >= concurrencyLimit) return 'degraded'; 42 + if (waitingCount > 0) return 'degraded'; 43 + return 'healthy'; 44 +} 45 + 46 export function computeProviderRuntimeMetrics(providerStatus, now = Date.now()) { ⋮ 81 : 0; 82 + const runtimeState = deriveProviderRuntimeState(providerStatus, now); 83 ⋮ 110 queuePressurePenalty, 77 - queueOverflowPenalty 111 + queueOverflowPenalty, 112 + runtimeState 113 }; ⋮ 115 81 -export function scoreProviderNode(providerStatus, { 116 +export function buildProviderScoreBreakdown(providerStatus, { 117 now = Date.now(), ⋮ 119 pool = [], 85 - budgetPressure = 0 120 + budgetPressure = 0, 121 + requestProfile = null 122 } = {}) { ⋮ 126 if (!config.isHealthy || config.isDisabled) { 91 - return 1e18; 127 + return { 128 + sortableScore: 1e18, 129 + selectionWeight: 0, 130 + runtimeState: metrics.runtimeState, 131 + blocked: true, 132 + blockedReason: !config.isHealthy ? metrics.runtimeState : 'disabled' 133 + }; 134 } ⋮ 138 if (Number.isFinite(recoveryTime) && recoveryTime > now) { 97 - return 1e18; 139 + return { 140 + sortableScore: 1e18, 141 + selectionWeight: 0, 142 + runtimeState: metrics.runtimeState, 143 + blocked: true, 144 + blockedReason: 'scheduled_recovery' 145 + }; 146 } ⋮ 148 101 - if (config.scheduledRecoveryTime && !config.isHealthy) { 102 - return 1e18; 103 - } 104 - 149 if (metrics.concurrencyLimit > 0 && metrics.activeCount >= metrics.concurrencyLimit) { 150 if (metrics.queueLimit > 0 && metrics.waitingCount >= metrics.queueLimit) { 107 - return 1e17; 151 + return { 152 + sortableScore: 1e17, 153 + selectionWeight: 0, 154 + runtimeState: metrics.runtimeState, 155 + blocked: true, 156 + blockedReason: 'queue_full' 157 + }; 158 } 109 - return 1e15 + metrics.waitingCount * 1e10; 159 } 160 161 + const profile = { 162 + latencyWeight: 0.65, 163 + throughputWeight: 0.65, 164 + costWeight: 0.65, 165 + stabilityWeight: 0.9, 166 + ...(requestProfile && typeof requestProfile === 'object' ? requestProfile : {}) 167 + }; 168 + 169 const priorityScore = metrics.priority * 1e13; ⋮ 179 const weightAdjustment = metrics.weight > 0 ? 1000 / metrics.weight : 1000; 123 - const budgetPenalty = Math.max(0, toFiniteNumber(budgetPressure, 0)) * metrics.costPenalty * 4; 180 + const budgetPenalty = Math.max(0, toFiniteNumber(budgetPressure, 0)) * metrics.costPenalty * (2.5 + profile.costWeight * 2.5); 181 + const latencyPenalty = metrics.latencyPenalty * (0.75 + profile.latencyWeight); 182 + const loadPenalty = (metrics.loadPenalty + metrics.queuePressurePenalty + metrics.queueOverflowPenalty) 183 + * (0.7 + profile.throughputWeight); 184 + const stabilityPenalty = (metrics.successPenalty + metrics.errorPenalty) 185 + * (0.7 + profile.stabilityWeight); 186 + const costPenalty = metrics.costPenalty * (0.55 + profile.costWeight); 187 + const degradedPenalty = metrics.runtimeState === 'degraded' 188 + ? 12000 * (0.6 + profile.stabilityWeight) 189 + : 0; 190 125 - return priorityScore 191 + const sortableScore = priorityScore 192 + freshnessBaseScore ⋮ 194 + sequenceScore 129 - + metrics.loadPenalty 130 - + metrics.successPenalty 131 - + metrics.latencyPenalty 132 - + metrics.costPenalty 133 - + metrics.errorPenalty 134 - + metrics.queuePressurePenalty 135 - + metrics.queueOverflowPenalty 195 + + latencyPenalty 196 + + loadPenalty 197 + + stabilityPenalty 198 + + costPenalty 199 + budgetPenalty 200 + + degradedPenalty 201 + freshBonus 202 + weightAdjustment; 203 + 204 + const desirability = 1 / Math.max(1, sortableScore / 1000); 205 + const selectionWeight = clamp(desirability, 0.000001, 1); 206 + 207 + return { 208 + sortableScore, 209 + selectionWeight, 210 + runtimeState: metrics.runtimeState, 211 + blocked: false, 212 + blockedReason: null, 213 + components: { 214 + priorityScore, 215 + freshnessBaseScore, 216 + usagePenalty: metrics.usagePenalty, 217 + sequenceScore, 218 + latencyPenalty, 219 + loadPenalty, 220 + stabilityPenalty, 221 + costPenalty, 222 + budgetPenalty, 223 + degradedPenalty, 224 + freshBonus, 225 + weightAdjustment 226 + }, 227 + metrics: { 228 + successRate: metrics.successRate, 229 + avgLatencyMs: metrics.avgLatencyMs, 230 + activeCount: metrics.activeCount, 231 + waitingCount: metrics.waitingCount, 232 + costWeight: metrics.costWeight, 233 + budgetPressure: toFiniteNumber(budgetPressure, 0) 234 + } 235 + }; 236 +} 237 + 238 +export function scoreProviderNode(providerStatus, { 239 + now = Date.now(), 240 + minSeqInPool = -1, 241 + pool = [], 242 + budgetPressure = 0, 243 + requestProfile = null 244 +} = {}) { 245 + return buildProviderScoreBreakdown(providerStatus, { 246 + now, 247 + minSeqInPool, 248 + pool, 249 + budgetPressure, 250 + requestProfile 251 + }).sortableScore; 252 }

• Edited E:\My Project\Token Pool\src\observability\provider-debug.js (+2 -0) 1 import { getConfiguredSupportedModels, normalizeModelIds } from '../providers/provider-models.js'; 2 +import { deriveProviderRuntimeState } from '../providers/provider-score.js'; 3 ⋮ 63 isHealthy: entry.config?.isHealthy ?? null, 64 + runtimeState: deriveProviderRuntimeState(entry), 65 isDisabled: entry.config?.isDisabled ?? false,

• Edited E:\My Project\Token Pool\src\providers\provider-pool-manager.js (+79 -18) 16 applyProviderSuccessMetrics, 17 + buildProviderScoreBreakdown, 18 scoreProviderNode ⋮ 585 pool: this.providerStatus[providerStatus.type] || [], 585 - budgetPressure: requestContext?.budgetPressure || 0 586 + budgetPressure: requestContext?.budgetPressure || 0, 587 + requestProfile: requestContext?.requestProfile || null 588 }); ⋮ 590 591 + _buildProviderCandidate(providerStatus, providerType, now, minSeq, requestContext) { 592 + const breakdown = buildProviderScoreBreakdown(providerStatus, { 593 + now, 594 + minSeqInPool: minSeq, 595 + pool: this.providerStatus[providerType] || [], 596 + budgetPressure: requestContext?.budgetPressure || 0, 597 + requestProfile: requestContext?.requestProfile || null 598 + }); 599 + 600 + return { 601 + providerStatus, 602 + providerId: providerStatus?.config?.uuid || null, 603 + providerType, 604 + customName: providerStatus?.config?.customName || null, 605 + score: breakdown.sortableScore, 606 + selectionWeight: breakdown.selectionWeight, 607 + runtimeState: breakdown.runtimeState, 608 + scoreComponents: breakdown.components || null, 609 + scoreMetrics: breakdown.metrics || null 610 + }; 611 + } 612 + 613 + _pickCandidateByWeight(candidates = []) { 614 + const explorationRate = Math.max(0, Math.min(1, Number(this.globalConfig?.ROUTER_EXPLORATION_RATE ?? 0))); 615 + if (candidates.length <= 1) { 616 + return candidates[0] || null; 617 + } 618 + 619 + if (explorationRate > 0 && Math.random() < explorationRate) { 620 + const randomIndex = Math.floor(Math.random() * candidates.length); 621 + return candidates[randomIndex]; 622 + } 623 + 624 + const totalWeight = candidates.reduce((sum, candidate) => { 625 + return sum + Math.max(0, Number(candidate.selectionWeight) || 0); 626 + }, 0); 627 + 628 + if (!(totalWeight > 0)) { 629 + return candidates 630 + .slice() 631 + .sort((left, right) => left.score - right.score || String(left.providerId).localeCompare(String(right.provi derId)))[0] || null; 632 + } 633 + 634 + let cursor = Math.random() * totalWeight; 635 + for (const candidate of candidates) { 636 + cursor -= Math.max(0, Number(candidate.selectionWeight) || 0); 637 + if (cursor <= 0) { 638 + return candidate; 639 + } 640 + } 641 + 642 + return candidates[candidates.length - 1] || null; 643 + } 644 + 645 /** ⋮ 1081 1082 + if (availableAndHealthyProviders.length === 0) { 1083 + this._log('warn', No available and healthy providers for type: ${providerType}); 1084 + return null; 1085 + } 1086 + 1087 + const rankedCandidates = availableAndHealthyProviders 1088 + .map(provider => this._buildProviderCandidate(provider, providerType, now, minSeq, requestContext)) 1089 + .sort((left, right) => left.score - right.score || String(left.providerId).localeCompare(String(right.providerI d))); 1090 + 1091 if (requestContext && typeof requestContext === 'object') { 1027 - requestContext.candidateProviders = availableAndHealthyProviders.map(provider => ({ 1028 - providerId: provider?.config?.uuid || null, 1029 - providerType, 1030 - score: this._calculateNodeScore(provider, now, minSeq, requestContext) 1092 + requestContext.candidateProviders = rankedCandidates.map(candidate => ({ 1093 + providerId: candidate.providerId, 1094 + providerType: candidate.providerType, 1095 + customName: candidate.customName, 1096 + score: candidate.score, 1097 + selectionWeight: candidate.selectionWeight, 1098 + runtimeState: candidate.runtimeState, 1099 + scoreMetrics: candidate.scoreMetrics 1100 })); ⋮ 1102 1034 - if (availableAndHealthyProviders.length === 0) { 1035 - this._log('warn', No available and healthy providers for type: ${providerType}); 1103 + const selectedCandidate = this._pickCandidateByWeight(rankedCandidates); 1104 + const selected = selectedCandidate?.providerStatus || rankedCandidates[0]?.providerStatus || null; 1105 + if (!selected) { 1106 + this._log('warn', Weighted routing failed to resolve a provider for type: ${providerType}); 1107 return null; ⋮ 1109 1039 - // 改进:使用统一的评分策略进行选择 1047 - })[0]; 1048 - 1110 // 始终更新 lastUsed(确保 LRU 策略生效,避免并发请求选到同一个 provider) ⋮ 1126 1066 - this._log('debug', Selected provider for ${providerType} (LRU): ${selected.config.uuid}${requestedModel ? for mo del: ${requestedModel} : ''}${options.skipUsageCount ? ' (skip usage count)' : ''}); 1127 + this._log('debug', Selected provider for ${providerType} (score routing): ${selected.config.uuid}${requestedModel ? for model: ${requestedModel} : ''}${options.skipUsageCount ? ' (skip usage count)' : ''}); 1128

• Edited E:\My Project\Token Pool\src\observability\debug-store.js (+2 -0) 24 keyId: entry?.keyId || requestContext.keyId || null, 25 + requestClass: requestContext.requestClass || null, 26 + requestPriority: requestContext.requestPriority || null, 27 providerId: requestContext.selectedProviderId || entry?.selectedProviderId || null,

• Ran Get-Content 'E:\My Project\Token Pool\src\utils\common.js' | Select-Object -First 40 └ export { MODEL_PROTOCOL_PREFIX, MODEL_PROVIDER } from './constants.js'; import { promises as fs } from 'fs'; … +36 lines (ctrl + t to view transcript) 'ECONNREFUSED', // 连接被拒绝 'ENOTFOUND', // DNS 解析失败

• Edited E:\My Project\Token Pool\src\utils\common.js (+1 -0) 15 import { resolveRoutingPolicy, summarizeRoutingPolicy } from '../core/routing-policy.js'; 16 +import { classifyRequestProfile } from '../core/request-classifier.js'; 17

• Edited E:\My Project\Token Pool\src\utils\common.js (+15 -2) 1242 logger.info([Content Generation] Model: ${model}, Stream: ${isStream}); 1243 - 1244 - let actualCustomName = CONFIG.customName; 1243 + const requestClassification = classifyRequestProfile(originalRequestBody, endpointType); 1244 + mergeRequestContext(requestContext, { 1245 + requestClass: requestClassification.requestClass, 1246 + requestPriority: requestClassification.requestPriority, 1247 + requestProfile: requestClassification.profile, 1249 + estimatedOutputBudgetTokens: requestClassification.estimatedOutputTokens 1250 + }); 1251 + logger.info([Routing Profile] ${JSON.stringify({ 1255 + })}); 1256 + 1257 + let actualCustomName = CONFIG.customName; 1258 let releaseCrsRequestPolicy = null;

• Edited E:\My Project\Token Pool\src\providers\provider-pool-manager.js (+7 -0) 613 _pickCandidateByWeight(candidates = []) { 614 + const selectionMode = String(this.globalConfig?.ROUTER_SELECTION_MODE || 'ranked').trim().toLowerCase(); 615 + if (selectionMode !== 'weighted') { 616 + return candidates 617 + .slice() 618 + .sort((left, right) => left.score - right.score || String(left.providerId).localeCompare(String(right.provid erId)))[0] || null; 619 + } 620 + 621 const explorationRate = Math.max(0, Math.min(1, Number(this.globalConfig?.ROUTER_EXPLORATION_RATE ?? 0)));

■ stream disconnected before completion: stream closed before response.completed

› 继续

■ stream disconnected before completion: stream closed before response.completed