Scope note
ContentMRS/DataBase and ContentMRS/ContentBase are present as nested repos (not only docs). Integration is partially live: Gateway owns EvidencePack; novel runtime consumes it via createDataBaseGatewayClient().searchEvidencePack(). Vendor trees under E:\My Project\ContentMRS\vendor\ are read-only references per vendor/README.md—copy logic, not whole apps.
1. Vendor entry files (copy FROM)
vendor/open-notebook — notebook / collection scope
| FROM path | Purpose |
|---|---|
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\domain\notebook.py |
Notebook model; get_sources() / get_notes() via Surreal reference / artifact edges = hard notebook boundary |
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\notebooks.py |
REST CRUD for notebooks; scope entry for operators |
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\sources.py |
Source ingest/upload; binds materials to notebook(s) |
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\context.py |
POST /notebooks/{id}/context — configurable source/note inclusion = scope DSL |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\context_builder.py |
Generic context assembly with token budget and inclusion levels |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\domain\notebook.py (vector_search, text_search) |
Scoped keyword/vector search over sources/notes only |
E:\My Project\ContentMRS\vendor\open-notebook\api\routers\search.py |
HTTP wrapper for scoped search + multi-step ask graph |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\graphs\ask.py |
LangGraph: strategy → parallel sub-searches → final cited answer |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\graphs\source.py |
Ingest pipeline: extract → transform → embed; notebook_ids on state |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\embedding.py |
Embedding for scoped vector search |
E:\My Project\ContentMRS\vendor\open-notebook\open_notebook\utils\chunking.py |
Chunk boundaries for notebook-local retrieval |
E:\My Project\ContentMRS\vendor\open-notebook\api\main.py |
FastAPI app wiring (deployment shape reference only) |
vendor/ragflow — dataset retrieval (hard boundary)
| FROM path | Purpose |
|---|---|
E:\My Project\ContentMRS\vendor\ragflow\api\apps\sdk\doc.py |
POST /api/v1/retrieval — same contract Gateway already calls (dataset_ids, question, thresholds) |
E:\My Project\ContentMRS\vendor\ragflow\api\apps\services\dataset_api_service.py (search_datasets) |
Core hybrid retrieval implementation + KG branch |
E:\My Project\ContentMRS\vendor\ragflow\api\apps\restful_apis\dataset_api.py |
POST /api/v1/datasets/search and per-dataset search — dataset-scoped test API |
E:\My Project\ContentMRS\vendor\ragflow\api\db\services\knowledgebase_service.py |
Dataset (knowledge base) lifecycle, access control, parser config |
E:\My Project\ContentMRS\vendor\ragflow\api\db\services\document_service.py |
Document-in-dataset indexing state |
E:\My Project\ContentMRS\vendor\ragflow\api\apps\sdk\dify_retrieval.py |
External retrieval adapter (Dify-shaped); useful for normalizing request/response |
E:\My Project\ContentMRS\vendor\ragflow\docs\references\http_api_reference.md |
Canonical HTTP field names for retrieval payloads |
E:\My Project\ContentMRS\vendor\ragflow\web\src\services\knowledge-service.ts |
Client-side retrievalTest → datasets/search (UI contract mirror) |
vendor/paper-qa — cited research / EvidencePack shape
| FROM path | Purpose |
|---|---|
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\main.py |
agent_query / run_agent — multi-tool research loop entry |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\tools.py |
GatherEvidence, GenerateAnswer, PaperSearch; session + citation state |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\agents\search.py |
Local index / document store for corpus-scoped search |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\docs.py |
Docs collection = corpus boundary; chunking + retrieval over owned docs |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\types.py |
Context, PQASession, citation keys, bibliographic metadata |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\core.py |
Evidence summarization + relevance scoring JSON |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\prompts.py |
Citation-key constraints and answer-with-evidence prompts |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\settings.py |
Agent/index settings surface |
E:\My Project\ContentMRS\vendor\paper-qa\src\paperqa\clients\openalex.py (and crossref.py, semantic_scholar.py) |
External metadata enrichment (optional Tier-2) |
vendor/storm — multi-step research → outline → cited article
| FROM path | Purpose |
|---|---|
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\engine.py |
STORMWikiRunner orchestrates full pipeline |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\knowledge_curation.py |
Multi-turn persona dialogue + retriever calls = research phase |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\persona_generator.py |
Multi-perspective question generation |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\retriever.py |
Retrieval adapter + source-quality rules |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\outline_generation.py |
Outline from collected information table |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\article_generation.py |
Section-wise generation with per-section retrieval |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\article_polish.py |
Second-pass polish with references |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\storm_wiki\modules\storm_dataclass.py |
StormInformationTable, StormArticle — research artifacts |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\interface.py |
Information / Retriever / module interfaces |
E:\My Project\ContentMRS\vendor\storm\knowledge_storm\rm.py |
Search backends (Tavily, Bing, etc.) — web leg only |
E:\My Project\ContentMRS\vendor\storm\examples\storm_examples\run_storm_wiki_gpt.py |
Minimal runnable wiring reference |
2. Current integration points (already in repo)
ContentMRS/DataBase/apps/gateway
| Path | Purpose |
|---|---|
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\evidence.ts |
EvidencePack hub: MySQL search_chunks, semantic units, web provider, RAGFlow /api/v1/retrieval, screening metadata |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\research.ts |
POST /research/resolve-topic, GET /research/topics, POST /research/query → internal GET /evidence/search |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\evidence-screening.ts |
applyEvidencePostFilters, screenEvidenceCandidates — central-claim / exclude-query gate |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\topic-resolver.ts |
Topic → sourceIds, excludeQueries, scopeMode (transitional; vendor README says freeze per-topic growth) |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\topic-corpus.ts |
Loads config/topic-corpus.json registry |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\category-register.ts |
Category/lexicon register helpers |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\lib\lexicon-register.ts |
Style lexicon projection helpers |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\search.ts |
Raw search_chunks keyword API (lower-level than EvidencePack) |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes\semantic.ts |
Semantic unit reads feeding evidence path |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\ragflow-readiness.ts |
RAGFlow dataset/embedding smoke for includeRagflow |
E:\My Project\ContentMRS\DataBase\apps\gateway\src\routes.ts |
Mounts /evidence/*, /research/* |
E:\My Project\ContentMRS\DataBase\apps\gateway\config\topic-corpus.json |
Topic lane defaults (should shrink as notebook/dataset scope lands) |
E:\My Project\ContentMRS\DataBase\packages\schemas\content\canonical-content-contract.ts |
Evidence object types (contract truth) |
E:\My Project\ContentMRS\DataBase\packages\database-client\src\apis\DefaultApi.ts |
Generated client: searchEvidencePack, etc. |
E:\My Project\ContentMRS\DataBase\docs\contracts\evidence-contract.md |
Canonical EvidencePack ownership rules |
evidence.ts internal seams (good split targets when extracting vendor logic):
searchDatabaseEvidence,searchSemanticEvidence— corpus DB scopesearchRagflowEvidence,appendRagflowChunks— RAGFlow dataset scope (mirrorsvendor/ragflow/.../doc.py)searchWebEvidence— Tavily legbuildEvidenceQueries— multi-round query expansion (STORM/paper-qa pattern candidate)
ContentMRS/ContentBase/product/novel
| Path | Purpose |
|---|---|
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\context.ts |
resolveArticleContextFromDataBase; readArticleEvidencePack → Gateway; normalizes citations |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\capability.ts |
Main generate path; resolveArticleRetrievalPlan (LLM planner); research/writer/reviewer agents; material screening |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\runtime.ts |
Request types: evidenceQuery, retrievalPlan, topicScopeMode |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\topic-preset.ts |
Default rounds/limit; optional topicId preset (being de-emphasized) |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\material-relevance.ts |
Post-pack material filter (ContentBase-side scope gate) |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\article-agent-contracts.ts |
Research/writer/reviewer prompts and handoffs |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\generation-workflow.ts |
LangGraph stages: context → material → write → observe |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\context-engineering.ts |
Material function plan / argument digest |
E:\My Project\ContentMRS\ContentBase\product\novel\app\article\trace.ts |
Reference coverage trace (citation QA) |
E:\My Project\ContentMRS\ContentBase\product\novel\core\utils\database-gateway-client.ts |
Thin wrapper over @emptyinkpot/database-gateway-generated-client |
E:\My Project\ContentMRS\ContentBase\product\novel\tools\generate-article-mvp.mjs |
Smoke CLI (--topic only) |
E:\My Project\ContentMRS\ContentBase\product\novel\tools\evidence-pack-smoke.mjs |
EvidencePack integration smoke |
ContentBase has no packages/ tree today; shared code lives under product/novel/app/ and core/.
3. Recommended minimal TO paths (FROM → TO)
Principle from CONTRACT.md / vendor/README.md: Gateway = evidence + scope + retrieval; ContentBase = orchestration + writing + citation enforcement. Prefer small new modules over growing evidence.ts / capability.ts further.
A. Notebook / collection scope (from open-notebook)
| FROM | TO | Purpose |
|---|---|---|
vendor/open-notebook/open_notebook/domain/notebook.py |
DataBase/apps/gateway/src/lib/material-scope.ts |
Map notebookId / sourceIds / forbidden sets → EvidenceQuery filters |
vendor/open-notebook/api/routers/context.py |
DataBase/apps/gateway/src/routes/scope.ts (new) |
POST /scope/resolve — explicit material-set boundary for ContentAdmin |
vendor/open-notebook/open_notebook/utils/context_builder.py |
DataBase/apps/gateway/src/lib/context-budget.ts |
Token-budgeted context packing for large packs (optional) |
vendor/open-notebook/open_notebook/domain/notebook.py (vector_search) |
DataBase/apps/gateway/src/lib/scope-search.ts |
Pattern for scoped DB search (filter by sourceIds only)—not Surreal |
Do not copy open-notebook UI/API wholesale into ContentAdmin; only scope + binding semantics.
B. Dataset retrieval (from ragflow)
| FROM | TO | Purpose |
|---|---|---|
vendor/ragflow/api/apps/sdk/doc.py (retrieval_test) |
DataBase/apps/gateway/src/lib/ragflow-retrieval.ts |
Extract searchRagflowEvidence + chunk normalization from evidence.ts |
vendor/ragflow/api/apps/services/dataset_api_service.py (search_datasets) |
DataBase/apps/gateway/src/lib/ragflow-retrieval.ts |
Reference for extra flags (use_kg, meta_data_filter) |
vendor/ragflow/api/db/services/knowledgebase_service.py |
DataBase/apps/gateway/src/ragflow-readiness.ts |
Align readiness checks with upstream dataset/embedding rules |
(existing) evidence.ts appendRagflowChunks |
DataBase/packages/schemas/content/canonical-content-contract.ts |
Keep one EvidenceChunk/citation shape (already here) |
C. Cited research / EvidencePack (from paper-qa)
| FROM | TO | Purpose |
|---|---|---|
vendor/paper-qa/src/paperqa/agents/tools.py (GatherEvidence, GenerateAnswer) |
DataBase/apps/gateway/src/lib/evidence-screening.ts |
Stronger relevance scoring + rejection reasons |
vendor/paper-qa/src/paperqa/prompts.py (CITATION_KEY_CONSTRAINTS) |
ContentBase/product/novel/app/article/citation-contract.ts (new) |
Writer/reviewer citation key rules |
vendor/paper-qa/src/paperqa/types.py (PQASession, Context) |
DataBase/packages/schemas/content/canonical-content-contract.ts |
Extend only if contract lacks fields; avoid parallel schema |
vendor/paper-qa/src/paperqa/docs.py |
DataBase/apps/gateway/scripts/ (ingest job, not runtime) |
Corpus ingest pattern → search_documents/search_chunks pipeline |
D. Multi-step research (from storm)
| FROM | TO | Purpose |
|---|---|---|
vendor/storm/.../knowledge_curation.py |
DataBase/apps/gateway/src/lib/research-query-planner.ts (new) |
Multi-query / multi-round expansion before buildEvidenceQueries |
vendor/storm/.../persona_generator.py |
DataBase/apps/gateway/src/routes/research.ts |
Optional sub-query perspectives (replace topic-regex inference) |
vendor/storm/.../outline_generation.py |
ContentBase/product/novel/app/article/research-outline.ts (new) |
Outline artifact between research and write (not in Gateway) |
vendor/storm/.../article_generation.py |
ContentBase/product/novel/app/article/capability.ts |
Sectioned write with per-section retrieval (thin hooks only) |
vendor/storm/knowledge_storm/rm.py |
DataBase/apps/web-evidence-provider/ |
Web search already split; keep STORM RM out of novel core |
E. Wire existing integration (minimal moves, no vendor copy)
| FROM (current) | TO (refactor target) | Purpose |
|---|---|---|
ContentBase/.../context.ts readArticleEvidencePack |
ContentBase/product/novel/app/article/evidence-client.ts (new) |
Single Gateway evidence client |
ContentBase/.../capability.ts resolveArticleRetrievalPlan |
ContentBase/product/novel/app/article/retrieval-plan.ts (new) |
Planner isolated from 4k-line capability |
DataBase/.../research.ts |
DataBase/apps/gateway/src/routes/research.ts |
Keep; delegate to research-query-planner.ts |
DataBase/.../topic-resolver.ts |
Shrink → material-scope.ts |
Replace inference rules with notebook/dataset scope |
F. Optional packages/ (only if sharing across Gateway + ContentBase)
| New package | Purpose |
|---|---|
DataBase/packages/evidence-core/ |
Pure TS: query expansion, screening, citation normalization (imported by gateway + published to ContentBase via existing database-client) |
Not recommended yet: ContentBase/packages/ |
No precedent; keep novel app/article/* until a second product needs it |
4. End-to-end flow (target wiring)
topic + optional notebookId / sourceIds / ragflowDatasetIds
→ Gateway scope resolve (open-notebook pattern)
→ Gateway research planner (storm pattern) → multi-round EvidencePack
(MySQL chunks + optional RAGFlow dataset + optional web)
→ Gateway screening (paper-qa pattern)
→ ContentBase context.ts / evidence-client.ts
→ capability.ts research → writer → reviewer (citation-contract.ts)5. What not to copy
- open-notebook FastAPI UI, SurrealDB stack — scope semantics only
- ragflow full
api/+web/— HTTP client + field mapping only (124 already hosts RAGFlow) - paper-qa Python agent runtime — port citation + gather loop ideas to TS
- storm Streamlit demo /
rm.pysearch fleet — port stage graph, route web to existingweb-evidence-provider - Do not add per-topic regex lanes in
topic-resolver.ts/topic-corpus.json(explicitly discouraged invendor/README.mdanddocs/evidence-scope.md)
6. Quick acceptance anchors (already documented)
- Gateway smoke:
DataBase/apps/gatewaynpm scripts /scripts/verify-production.ps1 - Novel smoke:
ContentBase/product/novel/tools/generate-article-mvp.mjswith--topic - RAGFlow:
ContentMRS/scripts/deploy-ragflow-124.ps1,ragflow-readiness.ts
If you want this turned into an implementation PR, switch to Agent mode and specify which lane first: notebook scope, RAGFlow extraction, STORM multi-query, or paper-qa citations.