Spaces:
Running
Running
File size: 10,902 Bytes
feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 14701ee feee0c0 2510c5e feee0c0 2510c5e feee0c0 14701ee feee0c0 14701ee feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 2510c5e feee0c0 05c9551 3c7dbf6 05c9551 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**Document-MCP** is a multi-tenant Obsidian-like documentation viewer with AI-first workflow. AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.
**Architecture**: Python backend (FastAPI + FastMCP) + React frontend (Vite + shadcn/ui)
**Key Concepts**:
- **Vault**: Per-user filesystem directory containing .md files
- **MCP Server**: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
- **Indexer**: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
- **Wikilinks**: `[[Note Name]]` resolved via case-insensitive slug matching (prefers same folder, then lexicographic)
- **Optimistic Concurrency**: Version counter in SQLite (not frontmatter); UI sends `if_version`, MCP uses last-write-wins
## Development Commands
### Backend (Python 3.11+)
```bash
cd backend
# Setup (first time)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .
# Install dev dependencies
uv pip install -e ".[dev]"
# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000
# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py
# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001
# Tests
uv run pytest # All tests
uv run pytest tests/unit # Unit tests only
uv run pytest tests/integration # Integration tests
uv run pytest -k test_vault_write # Single test pattern
uv run pytest -v # Verbose output
uv run pytest --lf # Last failed tests
```
### Frontend (Node 18+, React + Vite)
```bash
cd frontend
# Setup (first time)
npm install
# Development server
npm run dev # Start Vite dev server (http://localhost:5173)
# Build
npm run build # TypeScript compile + Vite build to dist/
# Lint
npm run lint # ESLint check
# Preview production build
npm run preview # Serve dist/ (after npm run build)
```
### Database Initialization
```bash
# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"
```
## Architecture Deep Dive
### Backend Service Layers
**3-tier architecture**:
1. **Models** (`backend/src/models/`): Pydantic schemas for validation
- `note.py`: Note, NoteMetadata, NoteSummary
- `user.py`: User, UserProfile
- `search.py`: SearchResult, SearchQuery
- `index.py`: IndexHealth
- `auth.py`: TokenRequest, TokenResponse
2. **Services** (`backend/src/services/`): Business logic
- `vault.py`: Filesystem operations (read/write/list/delete notes)
- `validate_note_path()`: Path security (no `..`, max 256 chars, Unix separators)
- `sanitize_path()`: Resolves and enforces vault root boundary
- `indexer.py`: SQLite FTS5 + metadata tracking
- `index_note()`: Updates metadata, FTS, tags, links (synchronous on every write)
- `search_notes()`: BM25 ranking with title 3x weight, body 1x, recency bonus
- `get_backlinks()`: Follows link graph (note β sources that reference it)
- `auth.py`: JWT + HF OAuth integration
- `create_access_token()`: Issues JWT with sub=user_id, exp=90days
- `verify_token()`: Validates JWT and extracts user_id
- `config.py`: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)
- `database.py`: SQLite connection manager + schema DDL
3. **API/MCP** (`backend/src/api/` and `backend/src/mcp/`):
- `api/routes/`: FastAPI endpoints (18 routes: auth, notes CRUD, search, backlinks, tags, index health/rebuild, graph, demo, system)
- `api/middleware/auth_middleware.py`: JWT Bearer token validation
- `mcp/server.py`: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)
**Critical Path Validation** (in `vault.py`):
- All note paths MUST pass `validate_note_path()` (returns `(bool, str)` tuple)
- Then `sanitize_path()` resolves and ensures no vault escape
- Failure = 400 Bad Request with specific error message
### SQLite Index Schema
5 tables (see `backend/src/services/database.py`):
1. **note_metadata**: Version tracking, size, timestamps (per note)
2. **note_fts**: Contentless FTS5 with porter tokenizer, `prefix='2 3'` for autocomplete
3. **note_tags**: Many-to-many (user_id, note_path, tag)
4. **note_links**: Link graph (source_path β target_path, is_resolved flag)
5. **index_health**: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)
**Indexer Update Flow** (in `indexer.py`):
```
write_note() β vault.write_note() β indexer.index_note()
β
[metadata table: version++]
[FTS table: re-insert title+body]
[tags table: clear + re-insert]
[links table: extract wikilinks, resolve, update backlinks]
[health table: note_count++, last_incremental_update=now]
```
### Wikilink Resolution Algorithm
In `indexer.py` (`resolve_wikilink` logic):
1. Normalize link text to slug: `normalize_slug("API Design")` β `"api-design"`
2. Find all notes where slug matches `normalize_slug(title)` or `normalize_slug(filename_stem)`
3. If multiple matches:
- Prefer same folder as source note
- Else lexicographically smallest path (ASCII sort)
4. Store in `note_links` table with `is_resolved=1` (or `0` if no match)
**Broken links** are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.
### MCP Server Modes
**STDIO** (`python src/mcp/server.py`):
- For Claude Desktop/Code local integration
- Uses `LOCAL_USER_ID` from env (default: "local-dev")
- No authentication
**HTTP** (`python src/mcp/server.py --http --port 8001`):
- For remote clients (HF Space deployment)
- Requires `Authorization: Bearer <jwt>` header
- JWT validated β user_id extracted β scoped to that user's vault
**Endpoint**: Tools defined in `mcp/server.py` with FastMCP decorators (`@mcp.tool`)
### Frontend Architecture
**Component Hierarchy**:
```
App.tsx (main layout)
βββ DirectoryTree.tsx (left sidebar: vault explorer with virtualization)
βββ NoteViewer.tsx (right pane: read mode, react-markdown rendering)
βββ NoteEditor.tsx (right pane: edit mode, split view with live preview)
βββ SearchBar.tsx (debounced search with dropdown results)
βββ AuthFlow.tsx (HF OAuth login, token management)
```
**Key Libraries**:
- `react-markdown`: Markdown rendering with wikilink custom renderer
- `shadcn/ui`: UI components (Tree, ScrollArea, Button, Textarea, Dialog)
- `lib/wikilink.ts`: Parse `[[...]]` + resolve via GET /api/backlinks
- `services/api.ts`: Fetch wrapper with Bearer token injection
**Wikilink Rendering** (in `NoteViewer.tsx`):
- Custom `react-markdown` renderer for links
- Detect `[[Note Name]]` pattern β fetch backlinks β resolve to path β make clickable
- Broken links styled differently (e.g., red/dashed underline)
### Version Conflict Flow (Optimistic Concurrency)
**UI Edit Scenario**:
1. User opens note β GET /api/notes/{path} β receives `{..., version: 5}`
2. User edits β clicks Save β PUT /api/notes/{path} with `{"if_version": 5, ...}`
3. Backend checks: if current version != 5 β return 409 Conflict
4. UI shows "Note changed, please reload" message
**MCP Write**: No version check, always succeeds (last-write-wins).
## Environment Configuration
See `.env.example` for all variables. Key settings:
- **MODE**: `local` (single-user, no OAuth) or `space` (HF multi-tenant)
- **JWT_SECRET_KEY**: Generate with `python -c "import secrets; print(secrets.token_urlsafe(32))"`
- **VAULT_BASE_DIR**: Where vaults are stored (e.g., `./data/vaults`)
- **DB_PATH**: SQLite database file (e.g., `./data/index.db`)
- **LOCAL_USER_ID**: Default user for local mode (default: `local-dev`)
**HF Space variables** (only needed when MODE=space):
- HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST
## Constraints & Limits
- **Note size**: 1 MiB max (enforced in vault.py)
- **Vault limit**: 5,000 notes per user (configurable in indexer.py)
- **Path length**: 256 chars max (validated in vault.py)
- **Wikilink syntax**: Only `[[wikilink]]` supported (no aliases like `[[link|alias]]`)
## Performance Targets
- MCP operations: <500ms for 1,000-note vaults
- UI directory load: <2s
- Note render: <1s
- Search: <1s for 5,000 notes
- Index rebuild: <30s for 1,000 notes
## SpecKit Workflow (in .specify/)
This repo uses the SpecKit methodology for feature planning:
- **specs/###-feature-name/**: Feature documentation
- `spec.md`: User stories, requirements, success criteria
- `plan.md`: Tech stack, architecture, structure
- `data-model.md`: Entities, schemas, validation
- `contracts/`: OpenAPI + MCP tool schemas
- `tasks.md`: Implementation task checklist
- **Slash commands**: `/speckit.specify`, `/speckit.plan`, `/speckit.tasks`, `/speckit.implement`
- **Scripts**: `.specify/scripts/bash/` (feature scaffolding, context updates)
Current active feature: `001-obsidian-docs-viewer`
## MCP Client Configuration
**Claude Desktop** (STDIO, local mode):
```json
{
"mcpServers": {
"document-mcp": {
"command": "uv",
"args": ["run", "python", "src/mcp/server.py"],
"cwd": "/absolute/path/to/Document-MCP/backend"
}
}
}
```
**Remote HTTP** (HF Space with JWT):
```json
{
"mcpServers": {
"document-mcp": {
"url": "https://your-space.hf.space/mcp",
"transport": "http",
"headers": {
"Authorization": "Bearer YOUR_JWT_TOKEN"
}
}
}
}
```
Obtain JWT: `POST /api/tokens` after HF OAuth login.
## Active Technologies
- Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI (004-gemini-vault-chat)
- Filesystem vault (existing), LlamaIndex persisted vector store (new, under `data/llamaindex/`) (004-gemini-vault-chat)
- TypeScript 5.x, React 18+ (006-ui-polish)
- localStorage for user preferences (font size, TOC panel state) (006-ui-polish)
## Recent Changes
- 004-gemini-vault-chat: Added Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI
|