Vault.MCP / CLAUDE.md
bigwolfeman
init
3c7dbf6

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Document-MCP is a multi-tenant Obsidian-like documentation viewer with AI-first workflow. AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.

Architecture: Python backend (FastAPI + FastMCP) + React frontend (Vite + shadcn/ui)

Key Concepts:

  • Vault: Per-user filesystem directory containing .md files
  • MCP Server: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
  • Indexer: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
  • Wikilinks: [[Note Name]] resolved via case-insensitive slug matching (prefers same folder, then lexicographic)
  • Optimistic Concurrency: Version counter in SQLite (not frontmatter); UI sends if_version, MCP uses last-write-wins

Development Commands

Backend (Python 3.11+)

cd backend

# Setup (first time)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

# Install dev dependencies
uv pip install -e ".[dev]"

# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py

# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001

# Tests
uv run pytest                          # All tests
uv run pytest tests/unit               # Unit tests only
uv run pytest tests/integration        # Integration tests
uv run pytest -k test_vault_write      # Single test pattern
uv run pytest -v                       # Verbose output
uv run pytest --lf                     # Last failed tests

Frontend (Node 18+, React + Vite)

cd frontend

# Setup (first time)
npm install

# Development server
npm run dev                   # Start Vite dev server (http://localhost:5173)

# Build
npm run build                 # TypeScript compile + Vite build to dist/

# Lint
npm run lint                  # ESLint check

# Preview production build
npm run preview               # Serve dist/ (after npm run build)

Database Initialization

# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"

Architecture Deep Dive

Backend Service Layers

3-tier architecture:

  1. Models (backend/src/models/): Pydantic schemas for validation

    • note.py: Note, NoteMetadata, NoteSummary
    • user.py: User, UserProfile
    • search.py: SearchResult, SearchQuery
    • index.py: IndexHealth
    • auth.py: TokenRequest, TokenResponse
  2. Services (backend/src/services/): Business logic

    • vault.py: Filesystem operations (read/write/list/delete notes)
      • validate_note_path(): Path security (no .., max 256 chars, Unix separators)
      • sanitize_path(): Resolves and enforces vault root boundary
    • indexer.py: SQLite FTS5 + metadata tracking
      • index_note(): Updates metadata, FTS, tags, links (synchronous on every write)
      • search_notes(): BM25 ranking with title 3x weight, body 1x, recency bonus
      • get_backlinks(): Follows link graph (note β†’ sources that reference it)
    • auth.py: JWT + HF OAuth integration
      • create_access_token(): Issues JWT with sub=user_id, exp=90days
      • verify_token(): Validates JWT and extracts user_id
    • config.py: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)
    • database.py: SQLite connection manager + schema DDL
  3. API/MCP (backend/src/api/ and backend/src/mcp/):

    • api/routes/: FastAPI endpoints (18 routes: auth, notes CRUD, search, backlinks, tags, index health/rebuild, graph, demo, system)
    • api/middleware/auth_middleware.py: JWT Bearer token validation
    • mcp/server.py: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)

Critical Path Validation (in vault.py):

  • All note paths MUST pass validate_note_path() (returns (bool, str) tuple)
  • Then sanitize_path() resolves and ensures no vault escape
  • Failure = 400 Bad Request with specific error message

SQLite Index Schema

5 tables (see backend/src/services/database.py):

  1. note_metadata: Version tracking, size, timestamps (per note)
  2. note_fts: Contentless FTS5 with porter tokenizer, prefix='2 3' for autocomplete
  3. note_tags: Many-to-many (user_id, note_path, tag)
  4. note_links: Link graph (source_path β†’ target_path, is_resolved flag)
  5. index_health: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)

Indexer Update Flow (in indexer.py):

write_note() β†’ vault.write_note() β†’ indexer.index_note()
                                  ↓
                            [metadata table: version++]
                            [FTS table: re-insert title+body]
                            [tags table: clear + re-insert]
                            [links table: extract wikilinks, resolve, update backlinks]
                            [health table: note_count++, last_incremental_update=now]

Wikilink Resolution Algorithm

In indexer.py (resolve_wikilink logic):

  1. Normalize link text to slug: normalize_slug("API Design") β†’ "api-design"
  2. Find all notes where slug matches normalize_slug(title) or normalize_slug(filename_stem)
  3. If multiple matches:
    • Prefer same folder as source note
    • Else lexicographically smallest path (ASCII sort)
  4. Store in note_links table with is_resolved=1 (or 0 if no match)

Broken links are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.

MCP Server Modes

STDIO (python src/mcp/server.py):

  • For Claude Desktop/Code local integration
  • Uses LOCAL_USER_ID from env (default: "local-dev")
  • No authentication

HTTP (python src/mcp/server.py --http --port 8001):

  • For remote clients (HF Space deployment)
  • Requires Authorization: Bearer <jwt> header
  • JWT validated β†’ user_id extracted β†’ scoped to that user's vault

Endpoint: Tools defined in mcp/server.py with FastMCP decorators (@mcp.tool)

Frontend Architecture

Component Hierarchy:

App.tsx (main layout)
β”œβ”€β”€ DirectoryTree.tsx (left sidebar: vault explorer with virtualization)
β”œβ”€β”€ NoteViewer.tsx (right pane: read mode, react-markdown rendering)
β”œβ”€β”€ NoteEditor.tsx (right pane: edit mode, split view with live preview)
β”œβ”€β”€ SearchBar.tsx (debounced search with dropdown results)
└── AuthFlow.tsx (HF OAuth login, token management)

Key Libraries:

  • react-markdown: Markdown rendering with wikilink custom renderer
  • shadcn/ui: UI components (Tree, ScrollArea, Button, Textarea, Dialog)
  • lib/wikilink.ts: Parse [[...]] + resolve via GET /api/backlinks
  • services/api.ts: Fetch wrapper with Bearer token injection

Wikilink Rendering (in NoteViewer.tsx):

  • Custom react-markdown renderer for links
  • Detect [[Note Name]] pattern β†’ fetch backlinks β†’ resolve to path β†’ make clickable
  • Broken links styled differently (e.g., red/dashed underline)

Version Conflict Flow (Optimistic Concurrency)

UI Edit Scenario:

  1. User opens note β†’ GET /api/notes/{path} β†’ receives {..., version: 5}
  2. User edits β†’ clicks Save β†’ PUT /api/notes/{path} with {"if_version": 5, ...}
  3. Backend checks: if current version != 5 β†’ return 409 Conflict
  4. UI shows "Note changed, please reload" message

MCP Write: No version check, always succeeds (last-write-wins).

Environment Configuration

See .env.example for all variables. Key settings:

  • MODE: local (single-user, no OAuth) or space (HF multi-tenant)
  • JWT_SECRET_KEY: Generate with python -c "import secrets; print(secrets.token_urlsafe(32))"
  • VAULT_BASE_DIR: Where vaults are stored (e.g., ./data/vaults)
  • DB_PATH: SQLite database file (e.g., ./data/index.db)
  • LOCAL_USER_ID: Default user for local mode (default: local-dev)

HF Space variables (only needed when MODE=space):

  • HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST

Constraints & Limits

  • Note size: 1 MiB max (enforced in vault.py)
  • Vault limit: 5,000 notes per user (configurable in indexer.py)
  • Path length: 256 chars max (validated in vault.py)
  • Wikilink syntax: Only [[wikilink]] supported (no aliases like [[link|alias]])

Performance Targets

  • MCP operations: <500ms for 1,000-note vaults
  • UI directory load: <2s
  • Note render: <1s
  • Search: <1s for 5,000 notes
  • Index rebuild: <30s for 1,000 notes

SpecKit Workflow (in .specify/)

This repo uses the SpecKit methodology for feature planning:

  • specs/###-feature-name/: Feature documentation
    • spec.md: User stories, requirements, success criteria
    • plan.md: Tech stack, architecture, structure
    • data-model.md: Entities, schemas, validation
    • contracts/: OpenAPI + MCP tool schemas
    • tasks.md: Implementation task checklist
  • Slash commands: /speckit.specify, /speckit.plan, /speckit.tasks, /speckit.implement
  • Scripts: .specify/scripts/bash/ (feature scaffolding, context updates)

Current active feature: 001-obsidian-docs-viewer

MCP Client Configuration

Claude Desktop (STDIO, local mode):

{
  "mcpServers": {
    "document-mcp": {
      "command": "uv",
      "args": ["run", "python", "src/mcp/server.py"],
      "cwd": "/absolute/path/to/Document-MCP/backend"
    }
  }
}

Remote HTTP (HF Space with JWT):

{
  "mcpServers": {
    "document-mcp": {
      "url": "https://your-space.hf.space/mcp",
      "transport": "http",
      "headers": {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
      }
    }
  }
}

Obtain JWT: POST /api/tokens after HF OAuth login.

Active Technologies

  • Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI (004-gemini-vault-chat)
  • Filesystem vault (existing), LlamaIndex persisted vector store (new, under data/llamaindex/) (004-gemini-vault-chat)
  • TypeScript 5.x, React 18+ (006-ui-polish)
  • localStorage for user preferences (font size, TOC panel state) (006-ui-polish)

Recent Changes

  • 004-gemini-vault-chat: Added Python 3.11+ (backend), TypeScript (frontend) + FastAPI, LlamaIndex, llama-index-llms-google-genai, llama-index-embeddings-google-genai, React 18+, Tailwind CSS, Shadcn/UI