Vault.MCP / CLAUDE.md
bigwolfeman
init
3c7dbf6
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**Document-MCP** is a multi-tenant Obsidian-like documentation viewer with AI-first workflow. AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.
**Architecture**: Python backend (FastAPI + FastMCP) + React frontend (Vite + shadcn/ui)
**Key Concepts**:
- **Vault**: Per-user filesystem directory containing .md files
- **MCP Server**: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
- **Indexer**: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
- **Wikilinks**: `[[Note Name]]` resolved via case-insensitive slug matching (prefers same folder, then lexicographic)
- **Optimistic Concurrency**: Version counter in SQLite (not frontmatter); UI sends `if_version`, MCP uses last-write-wins
## Development Commands
### Backend (Python 3.11+)
```bash
cd backend
# Setup (first time)
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e .
# Install dev dependencies
uv pip install -e ".[dev]"
# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000
# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py
# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001
# Tests
uv run pytest # All tests
uv run pytest tests/unit # Unit tests only
uv run pytest tests/integration # Integration tests
uv run pytest -k test_vault_write # Single test pattern
uv run pytest -v # Verbose output
uv run pytest --lf # Last failed tests
```
### Frontend (Node 18+, React + Vite)
```bash
cd frontend
# Setup (first time)
npm install
# Development server
npm run dev # Start Vite dev server (http://localhost:5173)
# Build
npm run build # TypeScript compile + Vite build to dist/
# Lint
npm run lint # ESLint check
# Preview production build
npm run preview # Serve dist/ (after npm run build)
```
### Database Initialization
```bash
# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"
```
## Architecture Deep Dive
### Backend Service Layers
**3-tier architecture**:
1. **Models** (`backend/src/models/`): Pydantic schemas for validation
- `note.py`: Note, NoteMetadata, NoteSummary
- `user.py`: User, UserProfile
- `search.py`: SearchResult, SearchQuery
- `index.py`: IndexHealth
- `auth.py`: TokenRequest, TokenResponse
2. **Services** (`backend/src/services/`): Business logic
- `vault.py`: Filesystem operations (read/write/list/delete notes)
- `validate_note_path()`: Path security (no `..`, max 256 chars, Unix separators)
- `sanitize_path()`: Resolves and enforces vault root boundary
- `indexer.py`: SQLite FTS5 + metadata tracking
- `index_note()`: Updates metadata, FTS, tags, links (synchronous on every write)
- `search_notes()`: BM25 ranking with title 3x weight, body 1x, recency bonus
- `get_backlinks()`: Follows link graph (note β†’ sources that reference it)
- `auth.py`: JWT + HF OAuth integration
- `create_access_token()`: Issues JWT with sub=user_id, exp=90days
- `verify_token()`: Validates JWT and extracts user_id
- `config.py`: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)
- `database.py`: SQLite connection manager + schema DDL
3. **API/MCP** (`backend/src/api/` and `backend/src/mcp/`):
- `api/routes/`: FastAPI endpoints (18 routes: auth, notes CRUD, search, backlinks, tags, index health/rebuild, graph, demo, system)
- `api/middleware/auth_middleware.py`: JWT Bearer token validation
- `mcp/server.py`: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)
**Critical Path Validation** (in `vault.py`):
- All note paths MUST pass `validate_note_path()` (returns `(bool, str)` tuple)
- Then `sanitize_path()` resolves and ensures no vault escape
- Failure = 400 Bad Request with specific error message
### SQLite Index Schema
5 tables (see `backend/src/services/database.py`):
1. **note_metadata**: Version tracking, size, timestamps (per note)
2. **note_fts**: Contentless FTS5 with porter tokenizer, `prefix='2 3'` for autocomplete
3. **note_tags**: Many-to-many (user_id, note_path, tag)
4. **note_links**: Link graph (source_path β†’ target_path, is_resolved flag)
5. **index_health**: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)
**Indexer Update Flow** (in `indexer.py`):
```
write_note() β†’ vault.write_note() β†’ indexer.index_note()
↓
[metadata table: version++]
[FTS table: re-insert title+body]
[tags table: clear + re-insert]
[links table: extract wikilinks, resolve, update backlinks]
[health table: note_count++, last_incremental_update=now]
```
### Wikilink Resolution Algorithm
In `indexer.py` (`resolve_wikilink` logic):
1. Normalize link text to slug: `normalize_slug("API Design")` β†’ `"api-design"`
2. Find all notes where slug matches `normalize_slug(title)` or `normalize_slug(filename_stem)`
3. If multiple matches:
- Prefer same folder as source note
- Else lexicographically smallest path (ASCII sort)
4. Store in `note_links` table with `is_resolved=1` (or `0` if no match)
**Broken links** are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.
### MCP Server Modes
**STDIO** (`python src/mcp/server.py`):
- For Claude Desktop/Code local integration
- Uses `LOCAL_USER_ID` from env (default: "local-dev")
- No authentication
**HTTP** (`python src/mcp/server.py --http --port 8001`):
- For remote clients (HF Space deployment)
- Requires `Authorization: Bearer <jwt>` header
- JWT validated β†’ user_id extracted β†’ scoped to that user's vault
**Endpoint**: Tools defined in `mcp/server.py` with FastMCP decorators (`@mcp.tool`)
### Frontend Architecture
**Component Hierarchy**:
```
App.tsx (main layout)
β”œβ”€β”€ DirectoryTree.tsx (left sidebar: vault explorer with virtualization)
β”œβ”€β”€ NoteViewer.tsx (right pane: read mode, react-markdown rendering)
β”œβ”€β”€ NoteEditor.tsx (right pane: edit mode, split view with live preview)
β”œβ”€β”€ SearchBar.tsx (debounced search with dropdown results)
└── AuthFlow.tsx (HF OAuth login, token management)
```
**Key Libraries**:
- `react-markdown`: Markdown rendering with wikilink custom renderer
- `shadcn/ui`: UI components (Tree, ScrollArea, Button, Textarea, Dialog)
- `lib/wikilink.ts`: Parse `[[...]]` + resolve via GET /api/backlinks
- `services/api.ts`: Fetch wrapper with Bearer token injection
**Wikilink Rendering** (in `NoteViewer.tsx`):
- Custom `react-markdown` renderer for links
- Detect `[[Note Name]]` pattern β†’ fetch backlinks β†’ resolve to path β†’ make clickable
- Broken links styled differently (e.g., red/dashed underline)
### Version Conflict Flow (Optimistic Concurrency)
**UI Edit Scenario**:
1. User opens note β†’ GET /api/notes/{path} β†’ receives `{..., version: 5}`
2. User edits β†’ clicks Save β†’ PUT /api/notes/{path} with `{"if_version": 5, ...}`
3. Backend checks: if current version != 5 β†’ return 409 Conflict
4. UI shows "Note changed, please reload" message
**MCP Write**: No version check, always succeeds (last-write-wins).
## Environment Configuration
See `.env.example` for all variables. Key settings:
- **MODE**: `local` (single-user, no OAuth) or `space` (HF multi-tenant)
- **JWT_SECRET_KEY**: Generate with `python -c "import secrets; print(secrets.token_urlsafe(32))"`
- **VAULT_BASE_DIR**: Where vaults are stored (e.g., `./data/vaults`)
- **DB_PATH**: SQLite database file (e.g., `./data/index.db`)
- **LOCAL_USER_ID**: Default user for local mode (default: `local-dev`)
**HF Space variables** (only needed when MODE=space):
- HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST
## Constraints & Limits
- **Note size**: 1 MiB max (enforced in vault.py)
- **Vault limit**: 5,000 notes per user (configurable in indexer.py)
- **Path length**: 256 chars max (validated in vault.py)
- **Wikilink syntax**: Only `[[wikilink]]` supported (no aliases like `[[link|alias]]`)
## Performance Targets
- MCP operations: <500ms for 1,000-note vaults
- UI directory load: <2s
- Note render: <1s
- Search: <1s for 5,000 notes
- Index rebuild: <30s for 1,000 notes
## SpecKit Workflow (in .specify/)
This repo uses the SpecKit methodology for feature planning:
- **specs/###-feature-name/**: Feature documentation
- `spec.md`: User stories, requirements, success criteria
- `plan.md`: Tech stack, architecture, structure
- `data-model.md`: Entities, schemas, validation
- `contracts/`: OpenAPI + MCP tool schemas
- `tasks.md`: Implementation task checklist
- **Slash commands**: `/speckit.specify`, `/speckit.plan`, `/speckit.tasks`, `/speckit.implement`
- **Scripts**: `.specify/scripts/bash/` (feature scaffolding, context updates)
Current active feature: `001-obsidian-docs-viewer`
## MCP Client Configuration
**Claude Desktop** (STDIO, local mode):
```json
{
"mcpServers": {
"document-mcp": {
"command": "uv",
"args": ["run", "python", "src/mcp/server.py"],
"cwd": "/absolute/path/to/Document-MCP/backend"
}
}
}
```
**Remote HTTP** (HF Space with JWT):
```json
{
"mcpServers": {
"document-mcp": {
"url": "https://your-space.hf.space/mcp",
"transport": "http",
"headers": {
"Authorization": "Bearer YOUR_JWT_TOKEN"
}
}
}
}
```
Obtain JWT: `POST /api/tokens` after HF OAuth login.
## Active Technologies
- Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI (004-gemini-vault-chat)
- Filesystem vault (existing), LlamaIndex persisted vector store (new, under `data/llamaindex/`) (004-gemini-vault-chat)
- TypeScript 5.x, React 18+ (006-ui-polish)
- localStorage for user preferences (font size, TOC panel state) (006-ui-polish)
## Recent Changes
- 004-gemini-vault-chat: Added Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI