Spaces:
Running
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Document-MCP is a multi-tenant Obsidian-like documentation viewer with AI-first workflow. AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.
Architecture: Python backend (FastAPI + FastMCP) + React frontend (Vite + shadcn/ui)
Key Concepts:
- Vault: Per-user filesystem directory containing .md files
- MCP Server: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
- Indexer: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
- Wikilinks:
[[Note Name]]resolved via case-insensitive slug matching (prefers same folder, then lexicographic) - Optimistic Concurrency: Version counter in SQLite (not frontmatter); UI sends
if_version, MCP uses last-write-wins
Development Commands
Backend (Python 3.11+)
cd backend
# Setup (first time)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .
# Install dev dependencies
uv pip install -e ".[dev]"
# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000
# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py
# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001
# Tests
uv run pytest # All tests
uv run pytest tests/unit # Unit tests only
uv run pytest tests/integration # Integration tests
uv run pytest -k test_vault_write # Single test pattern
uv run pytest -v # Verbose output
uv run pytest --lf # Last failed tests
Frontend (Node 18+, React + Vite)
cd frontend
# Setup (first time)
npm install
# Development server
npm run dev # Start Vite dev server (http://localhost:5173)
# Build
npm run build # TypeScript compile + Vite build to dist/
# Lint
npm run lint # ESLint check
# Preview production build
npm run preview # Serve dist/ (after npm run build)
Database Initialization
# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"
Architecture Deep Dive
Backend Service Layers
3-tier architecture:
Models (
backend/src/models/): Pydantic schemas for validationnote.py: Note, NoteMetadata, NoteSummaryuser.py: User, UserProfilesearch.py: SearchResult, SearchQueryindex.py: IndexHealthauth.py: TokenRequest, TokenResponse
Services (
backend/src/services/): Business logicvault.py: Filesystem operations (read/write/list/delete notes)validate_note_path(): Path security (no.., max 256 chars, Unix separators)sanitize_path(): Resolves and enforces vault root boundary
indexer.py: SQLite FTS5 + metadata trackingindex_note(): Updates metadata, FTS, tags, links (synchronous on every write)search_notes(): BM25 ranking with title 3x weight, body 1x, recency bonusget_backlinks(): Follows link graph (note β sources that reference it)
auth.py: JWT + HF OAuth integrationcreate_access_token(): Issues JWT with sub=user_id, exp=90daysverify_token(): Validates JWT and extracts user_id
config.py: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)database.py: SQLite connection manager + schema DDL
API/MCP (
backend/src/api/andbackend/src/mcp/):api/routes/: FastAPI endpoints (18 routes: auth, notes CRUD, search, backlinks, tags, index health/rebuild, graph, demo, system)api/middleware/auth_middleware.py: JWT Bearer token validationmcp/server.py: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)
Critical Path Validation (in vault.py):
- All note paths MUST pass
validate_note_path()(returns(bool, str)tuple) - Then
sanitize_path()resolves and ensures no vault escape - Failure = 400 Bad Request with specific error message
SQLite Index Schema
5 tables (see backend/src/services/database.py):
- note_metadata: Version tracking, size, timestamps (per note)
- note_fts: Contentless FTS5 with porter tokenizer,
prefix='2 3'for autocomplete - note_tags: Many-to-many (user_id, note_path, tag)
- note_links: Link graph (source_path β target_path, is_resolved flag)
- index_health: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)
Indexer Update Flow (in indexer.py):
write_note() β vault.write_note() β indexer.index_note()
β
[metadata table: version++]
[FTS table: re-insert title+body]
[tags table: clear + re-insert]
[links table: extract wikilinks, resolve, update backlinks]
[health table: note_count++, last_incremental_update=now]
Wikilink Resolution Algorithm
In indexer.py (resolve_wikilink logic):
- Normalize link text to slug:
normalize_slug("API Design")β"api-design" - Find all notes where slug matches
normalize_slug(title)ornormalize_slug(filename_stem) - If multiple matches:
- Prefer same folder as source note
- Else lexicographically smallest path (ASCII sort)
- Store in
note_linkstable withis_resolved=1(or0if no match)
Broken links are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.
MCP Server Modes
STDIO (python src/mcp/server.py):
- For Claude Desktop/Code local integration
- Uses
LOCAL_USER_IDfrom env (default: "local-dev") - No authentication
HTTP (python src/mcp/server.py --http --port 8001):
- For remote clients (HF Space deployment)
- Requires
Authorization: Bearer <jwt>header - JWT validated β user_id extracted β scoped to that user's vault
Endpoint: Tools defined in mcp/server.py with FastMCP decorators (@mcp.tool)
Frontend Architecture
Component Hierarchy:
App.tsx (main layout)
βββ DirectoryTree.tsx (left sidebar: vault explorer with virtualization)
βββ NoteViewer.tsx (right pane: read mode, react-markdown rendering)
βββ NoteEditor.tsx (right pane: edit mode, split view with live preview)
βββ SearchBar.tsx (debounced search with dropdown results)
βββ AuthFlow.tsx (HF OAuth login, token management)
Key Libraries:
react-markdown: Markdown rendering with wikilink custom renderershadcn/ui: UI components (Tree, ScrollArea, Button, Textarea, Dialog)lib/wikilink.ts: Parse[[...]]+ resolve via GET /api/backlinksservices/api.ts: Fetch wrapper with Bearer token injection
Wikilink Rendering (in NoteViewer.tsx):
- Custom
react-markdownrenderer for links - Detect
[[Note Name]]pattern β fetch backlinks β resolve to path β make clickable - Broken links styled differently (e.g., red/dashed underline)
Version Conflict Flow (Optimistic Concurrency)
UI Edit Scenario:
- User opens note β GET /api/notes/{path} β receives
{..., version: 5} - User edits β clicks Save β PUT /api/notes/{path} with
{"if_version": 5, ...} - Backend checks: if current version != 5 β return 409 Conflict
- UI shows "Note changed, please reload" message
MCP Write: No version check, always succeeds (last-write-wins).
Environment Configuration
See .env.example for all variables. Key settings:
- MODE:
local(single-user, no OAuth) orspace(HF multi-tenant) - JWT_SECRET_KEY: Generate with
python -c "import secrets; print(secrets.token_urlsafe(32))" - VAULT_BASE_DIR: Where vaults are stored (e.g.,
./data/vaults) - DB_PATH: SQLite database file (e.g.,
./data/index.db) - LOCAL_USER_ID: Default user for local mode (default:
local-dev)
HF Space variables (only needed when MODE=space):
- HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST
Constraints & Limits
- Note size: 1 MiB max (enforced in vault.py)
- Vault limit: 5,000 notes per user (configurable in indexer.py)
- Path length: 256 chars max (validated in vault.py)
- Wikilink syntax: Only
[[wikilink]]supported (no aliases like[[link|alias]])
Performance Targets
- MCP operations: <500ms for 1,000-note vaults
- UI directory load: <2s
- Note render: <1s
- Search: <1s for 5,000 notes
- Index rebuild: <30s for 1,000 notes
SpecKit Workflow (in .specify/)
This repo uses the SpecKit methodology for feature planning:
- specs/###-feature-name/: Feature documentation
spec.md: User stories, requirements, success criteriaplan.md: Tech stack, architecture, structuredata-model.md: Entities, schemas, validationcontracts/: OpenAPI + MCP tool schemastasks.md: Implementation task checklist
- Slash commands:
/speckit.specify,/speckit.plan,/speckit.tasks,/speckit.implement - Scripts:
.specify/scripts/bash/(feature scaffolding, context updates)
Current active feature: 001-obsidian-docs-viewer
MCP Client Configuration
Claude Desktop (STDIO, local mode):
{
"mcpServers": {
"document-mcp": {
"command": "uv",
"args": ["run", "python", "src/mcp/server.py"],
"cwd": "/absolute/path/to/Document-MCP/backend"
}
}
}
Remote HTTP (HF Space with JWT):
{
"mcpServers": {
"document-mcp": {
"url": "https://your-space.hf.space/mcp",
"transport": "http",
"headers": {
"Authorization": "Bearer YOUR_JWT_TOKEN"
}
}
}
}
Obtain JWT: POST /api/tokens after HF OAuth login.
Active Technologies
- Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI (004-gemini-vault-chat)
- Filesystem vault (existing), LlamaIndex persisted vector store (new, under
data/llamaindex/) (004-gemini-vault-chat) - TypeScript 5.x, React 18+ (006-ui-polish)
- localStorage for user preferences (font size, TOC panel state) (006-ui-polish)
Recent Changes
- 004-gemini-vault-chat: Added Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI