NOTE

缝合大纲

2026/05/21 49 min read NOTE 缝合大纲

Scope note

ContentMRS/DataBase and ContentMRS/ContentBase are present as nested repos (not only docs). Integration is partially live: Gateway owns EvidencePack; novel runtime consumes it via createDataBaseGatewayClient().searchEvidencePack(). Vendor trees under E:\My Project\ContentMRS\vendor\ are read-only references per vendor/README.md—copy logic, not whole apps.


1. Vendor entry files (copy FROM)

vendor/open-notebook — notebook / collection scope

FROM path Purpose
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\domain\notebook.py Notebook model; get_sources() / get_notes() via Surreal reference / artifact edges = hard notebook boundary
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\notebooks.py REST CRUD for notebooks; scope entry for operators
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\sources.py Source ingest/upload; binds materials to notebook(s)
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\context.py POST /notebooks/{id}/context — configurable source/note inclusion = scope DSL
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\context_builder.py Generic context assembly with token budget and inclusion levels
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\domain\notebook.py (vector_search, text_search) Scoped keyword/vector search over sources/notes only
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\search.py HTTP wrapper for scoped search + multi-step ask graph
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\graphs\ask.py LangGraph: strategy → parallel sub-searches → final cited answer
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\graphs\source.py Ingest pipeline: extract → transform → embed; notebook_ids on state
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\embedding.py Embedding for scoped vector search
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\chunking.py Chunk boundaries for notebook-local retrieval
E:\My Project\ContentMRS\vendor\open-notebook\api\main.py FastAPI app wiring (deployment shape reference only)

vendor/ragflow — dataset retrieval (hard boundary)

FROM path Purpose
E:\My Project\ContentMRS\vendor\ragflow\api\apps\sdk\doc.py POST /api/v1/retrieval — same contract Gateway already calls (dataset_ids, question, thresholds)
E:\My Project\ContentMRS\vendor\ragflow\api\apps\services\dataset_api_service.py (search_datasets) Core hybrid retrieval implementation + KG branch
E:\My Project\ContentMRS\vendor\ragflow\api\apps\restful_apis\dataset_api.py POST /api/v1/datasets/search and per-dataset search — dataset-scoped test API
E:\My Project\ContentMRS\vendor\ragflow\api\db\services\knowledgebase_service.py Dataset (knowledge base) lifecycle, access control, parser config
E:\My Project\ContentMRS\vendor\ragflow\api\db\services\document_service.py Document-in-dataset indexing state
E:\My Project\ContentMRS\vendor\ragflow\api\apps\sdk\dify_retrieval.py External retrieval adapter (Dify-shaped); useful for normalizing request/response
E:\My Project\ContentMRS\vendor\ragflow\docs\references\http_api_reference.md Canonical HTTP field names for retrieval payloads
E:\My Project\ContentMRS\vendor\ragflow\web\src\services\knowledge-service.ts Client-side retrievalTestdatasets/search (UI contract mirror)

vendor/paper-qa — cited research / EvidencePack shape

FROM path Purpose
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\main.py agent_query / run_agent — multi-tool research loop entry
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\tools.py GatherEvidence, GenerateAnswer, PaperSearch; session + citation state
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\search.py Local index / document store for corpus-scoped search
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\docs.py Docs collection = corpus boundary; chunking + retrieval over owned docs
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\types.py Context, PQASession, citation keys, bibliographic metadata
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\core.py Evidence summarization + relevance scoring JSON
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\prompts.py Citation-key constraints and answer-with-evidence prompts
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\settings.py Agent/index settings surface
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\clients\openalex.py (and crossref.py, semantic_scholar.py) External metadata enrichment (optional Tier-2)

vendor/storm — multi-step research → outline → cited article

FROM path Purpose
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\engine.py STORMWikiRunner orchestrates full pipeline
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\knowledge_curation.py Multi-turn persona dialogue + retriever calls = research phase
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\persona_generator.py Multi-perspective question generation
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\retriever.py Retrieval adapter + source-quality rules
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\outline_generation.py Outline from collected information table
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\article_generation.py Section-wise generation with per-section retrieval
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\article_polish.py Second-pass polish with references
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\storm_dataclass.py StormInformationTable, StormArticle — research artifacts
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\interface.py Information / Retriever / module interfaces
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\rm.py Search backends (Tavily, Bing, etc.) — web leg only
E:\My Project\ContentMRS\vendor\storm\examples\storm_examples\run_storm_wiki_gpt.py Minimal runnable wiring reference

2. Current integration points (already in repo)

ContentMRS/DataBase/apps/gateway

Path Purpose
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\evidence.ts EvidencePack hub: MySQL search_chunks, semantic units, web provider, RAGFlow /api/v1/retrieval, screening metadata
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\research.ts POST /research/resolve-topic, GET /research/topics, POST /research/query → internal GET /evidence/search
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\evidence-screening.ts applyEvidencePostFilters, screenEvidenceCandidates — central-claim / exclude-query gate
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\topic-resolver.ts Topic → sourceIds, excludeQueries, scopeMode (transitional; vendor README says freeze per-topic growth)
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\topic-corpus.ts Loads config/topic-corpus.json registry
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\category-register.ts Category/lexicon register helpers
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\lexicon-register.ts Style lexicon projection helpers
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\search.ts Raw search_chunks keyword API (lower-level than EvidencePack)
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\semantic.ts Semantic unit reads feeding evidence path
E:\My Project\ContentMRS\DataBase\apps\gateway\src\ragflow-readiness.ts RAGFlow dataset/embedding smoke for includeRagflow
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes.ts Mounts /evidence/*, /research/*
E:\My Project\ContentMRS\DataBase\apps\gateway\config\topic-corpus.json Topic lane defaults (should shrink as notebook/dataset scope lands)
E:\My Project\ContentMRS\DataBase\packages\schemas\content\canonical-content-contract.ts Evidence object types (contract truth)
E:\My Project\ContentMRS\DataBase\packages\database-client\src\apis\DefaultApi.ts Generated client: searchEvidencePack, etc.
E:\My Project\ContentMRS\DataBase\docs\contracts\evidence-contract.md Canonical EvidencePack ownership rules

evidence.ts internal seams (good split targets when extracting vendor logic):

  • searchDatabaseEvidence, searchSemanticEvidence — corpus DB scope
  • searchRagflowEvidence, appendRagflowChunks — RAGFlow dataset scope (mirrors vendor/ragflow/.../doc.py)
  • searchWebEvidence — Tavily leg
  • buildEvidenceQueries — multi-round query expansion (STORM/paper-qa pattern candidate)

ContentMRS/ContentBase/product/novel

Path Purpose
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\context.ts resolveArticleContextFromDataBase; readArticleEvidencePack → Gateway; normalizes citations
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\capability.ts Main generate path; resolveArticleRetrievalPlan (LLM planner); research/writer/reviewer agents; material screening
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\runtime.ts Request types: evidenceQuery, retrievalPlan, topicScopeMode
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\topic-preset.ts Default rounds/limit; optional topicId preset (being de-emphasized)
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\material-relevance.ts Post-pack material filter (ContentBase-side scope gate)
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\article-agent-contracts.ts Research/writer/reviewer prompts and handoffs
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\generation-workflow.ts LangGraph stages: context → material → write → observe
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\context-engineering.ts Material function plan / argument digest
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\trace.ts Reference coverage trace (citation QA)
E:\My Project\ContentMRS\ContentBase\product\novel\core\utils\database-gateway-client.ts Thin wrapper over @emptyinkpot/database-gateway-generated-client
E:\My Project\ContentMRS\ContentBase\product\novel\tools\generate-article-mvp.mjs Smoke CLI (--topic only)
E:\My Project\ContentMRS\ContentBase\product\novel\tools\evidence-pack-smoke.mjs EvidencePack integration smoke

ContentBase has no packages/ tree today; shared code lives under product/novel/app/ and core/.


Principle from CONTRACT.md / vendor/README.md: Gateway = evidence + scope + retrieval; ContentBase = orchestration + writing + citation enforcement. Prefer small new modules over growing evidence.ts / capability.ts further.

A. Notebook / collection scope (from open-notebook)

FROM TO Purpose
vendor/open-notebook/open_notebook/domain/notebook.py DataBase/apps/gateway/src/lib/material-scope.ts Map notebookId / sourceIds / forbidden sets → EvidenceQuery filters
vendor/open-notebook/api/routers/context.py DataBase/apps/gateway/src/routes/scope.ts (new) POST /scope/resolve — explicit material-set boundary for ContentAdmin
vendor/open-notebook/open_notebook/utils/context_builder.py DataBase/apps/gateway/src/lib/context-budget.ts Token-budgeted context packing for large packs (optional)
vendor/open-notebook/open_notebook/domain/notebook.py (vector_search) DataBase/apps/gateway/src/lib/scope-search.ts Pattern for scoped DB search (filter by sourceIds only)—not Surreal

Do not copy open-notebook UI/API wholesale into ContentAdmin; only scope + binding semantics.

B. Dataset retrieval (from ragflow)

FROM TO Purpose
vendor/ragflow/api/apps/sdk/doc.py (retrieval_test) DataBase/apps/gateway/src/lib/ragflow-retrieval.ts Extract searchRagflowEvidence + chunk normalization from evidence.ts
vendor/ragflow/api/apps/services/dataset_api_service.py (search_datasets) DataBase/apps/gateway/src/lib/ragflow-retrieval.ts Reference for extra flags (use_kg, meta_data_filter)
vendor/ragflow/api/db/services/knowledgebase_service.py DataBase/apps/gateway/src/ragflow-readiness.ts Align readiness checks with upstream dataset/embedding rules
(existing) evidence.ts appendRagflowChunks DataBase/packages/schemas/content/canonical-content-contract.ts Keep one EvidenceChunk/citation shape (already here)

C. Cited research / EvidencePack (from paper-qa)

FROM TO Purpose
vendor/paper-qa/src/paperqa/agents/tools.py (GatherEvidence, GenerateAnswer) DataBase/apps/gateway/src/lib/evidence-screening.ts Stronger relevance scoring + rejection reasons
vendor/paper-qa/src/paperqa/prompts.py (CITATION_KEY_CONSTRAINTS) ContentBase/product/novel/app/article/citation-contract.ts (new) Writer/reviewer citation key rules
vendor/paper-qa/src/paperqa/types.py (PQASession, Context) DataBase/packages/schemas/content/canonical-content-contract.ts Extend only if contract lacks fields; avoid parallel schema
vendor/paper-qa/src/paperqa/docs.py DataBase/apps/gateway/scripts/ (ingest job, not runtime) Corpus ingest pattern → search_documents/search_chunks pipeline

D. Multi-step research (from storm)

FROM TO Purpose
vendor/storm/.../knowledge_curation.py DataBase/apps/gateway/src/lib/research-query-planner.ts (new) Multi-query / multi-round expansion before buildEvidenceQueries
vendor/storm/.../persona_generator.py DataBase/apps/gateway/src/routes/research.ts Optional sub-query perspectives (replace topic-regex inference)
vendor/storm/.../outline_generation.py ContentBase/product/novel/app/article/research-outline.ts (new) Outline artifact between research and write (not in Gateway)
vendor/storm/.../article_generation.py ContentBase/product/novel/app/article/capability.ts Sectioned write with per-section retrieval (thin hooks only)
vendor/storm/knowledge_storm/rm.py DataBase/apps/web-evidence-provider/ Web search already split; keep STORM RM out of novel core

E. Wire existing integration (minimal moves, no vendor copy)

FROM (current) TO (refactor target) Purpose
ContentBase/.../context.ts readArticleEvidencePack ContentBase/product/novel/app/article/evidence-client.ts (new) Single Gateway evidence client
ContentBase/.../capability.ts resolveArticleRetrievalPlan ContentBase/product/novel/app/article/retrieval-plan.ts (new) Planner isolated from 4k-line capability
DataBase/.../research.ts DataBase/apps/gateway/src/routes/research.ts Keep; delegate to research-query-planner.ts
DataBase/.../topic-resolver.ts Shrinkmaterial-scope.ts Replace inference rules with notebook/dataset scope

F. Optional packages/ (only if sharing across Gateway + ContentBase)

New package Purpose
DataBase/packages/evidence-core/ Pure TS: query expansion, screening, citation normalization (imported by gateway + published to ContentBase via existing database-client)
Not recommended yet: ContentBase/packages/ No precedent; keep novel app/article/* until a second product needs it

4. End-to-end flow (target wiring)

topic + optional notebookId / sourceIds / ragflowDatasetIds
  → Gateway scope resolve (open-notebook pattern)
  → Gateway research planner (storm pattern) → multi-round EvidencePack
      (MySQL chunks + optional RAGFlow dataset + optional web)
  → Gateway screening (paper-qa pattern)
  → ContentBase context.ts / evidence-client.ts
  → capability.ts research → writer → reviewer (citation-contract.ts)

5. What not to copy

  • open-notebook FastAPI UI, SurrealDB stack — scope semantics only
  • ragflow full api/ + web/ — HTTP client + field mapping only (124 already hosts RAGFlow)
  • paper-qa Python agent runtime — port citation + gather loop ideas to TS
  • storm Streamlit demo / rm.py search fleet — port stage graph, route web to existing web-evidence-provider
  • Do not add per-topic regex lanes in topic-resolver.ts / topic-corpus.json (explicitly discouraged in vendor/README.md and docs/evidence-scope.md)

6. Quick acceptance anchors (already documented)

  • Gateway smoke: DataBase/apps/gateway npm scripts / scripts/verify-production.ps1
  • Novel smoke: ContentBase/product/novel/tools/generate-article-mvp.mjs with --topic
  • RAGFlow: ContentMRS/scripts/deploy-ragflow-124.ps1, ragflow-readiness.ts

If you want this turned into an implementation PR, switch to Agent mode and specify which lane first: notebook scope, RAGFlow extraction, STORM multi-query, or paper-qa citations.