# Data Model: Multi-Tenant Obsidian-Like Docs Viewer **Feature Branch**: `001-obsidian-docs-viewer` **Created**: 2025-11-15 **Status**: Draft ## Table of Contents 1. [Overview](#overview) 2. [Entity Relationship Diagram](#entity-relationship-diagram) 3. [Core Entities](#core-entities) 4. [Index Entities](#index-entities) 5. [Authentication Entities](#authentication-entities) 6. [SQLite Schema](#sqlite-schema) 7. [Pydantic Models](#pydantic-models) 8. [TypeScript Type Definitions](#typescript-type-definitions) 9. [Validation Rules](#validation-rules) 10. [State Transitions](#state-transitions) 11. [Relationships and Constraints](#relationships-and-constraints) --- ## Overview This document defines the complete data model for a multi-tenant Obsidian-like documentation viewer. The system stores: - **User accounts** with HF OAuth identity mapping - **Vaults** as per-user directory trees containing Markdown notes - **Notes** with YAML frontmatter, version tracking, and full-text indexing - **Wikilinks** for bidirectional linking between notes - **Tags** for categorization and filtering - **Index metadata** for search optimization and health monitoring **Design principles**: - **Per-user isolation**: All data scoped by `user_id` - **Filesystem-backed**: Notes stored as `.md` files, metadata in SQLite - **Version-controlled**: Integer version counter for optimistic concurrency - **Search-optimized**: SQLite FTS5 for full-text search, separate indexes for tags and links --- ## Entity Relationship Diagram ```mermaid erDiagram USER ||--o{ VAULT : owns VAULT ||--o{ NOTE : contains NOTE ||--o{ WIKILINK : has_outgoing NOTE ||--o{ WIKILINK : has_incoming NOTE ||--o{ TAG : tagged_with USER ||--|| INDEX_HEALTH : tracks USER ||--o{ TOKEN : issues USER { string user_id PK string hf_username string hf_name string hf_avatar_url datetime created } VAULT { string user_id FK string root_path int note_count } NOTE { string user_id FK string note_path PK int version string title json metadata string body datetime created datetime updated int size_bytes } WIKILINK { string user_id FK string source_path FK string target_path FK string link_text bool is_resolved } TAG { string user_id FK string tag_name string note_path FK } INDEX_HEALTH { string user_id PK int note_count datetime last_full_rebuild datetime last_incremental_update } TOKEN { string token_id PK string user_id FK datetime issued_at datetime expires_at string token_type } ``` **Key relationships**: - One user owns one vault (1:1) - One vault contains many notes (1:N) - One note has many outgoing wikilinks (1:N) - One note may be referenced by many backlinks (1:N) - One note can have many tags (N:M via junction table) - One user has one index health record (1:1) - One user can issue many tokens (1:N) --- ## Core Entities ### User Represents an authenticated user with HF OAuth identity. **Attributes**: - `user_id` (string, PK): Internal unique identifier, derived from HF username or UUID - `hf_username` (string, nullable): HuggingFace username (e.g., "alice") - `hf_name` (string, nullable): Display name from HF profile - `hf_avatar_url` (string, nullable): Profile picture URL - `created` (datetime): Account creation timestamp (ISO 8601) **Notes**: - In local mode, `user_id = "local-dev"` with null HF fields - In HF Space mode, `user_id = hf_username` (normalized to lowercase) - `created` timestamp set on first OAuth login (vault initialization) **Lifecycle**: 1. User authenticates via HF OAuth 2. Backend maps HF identity to `user_id` 3. If new user, create vault directory and initialize index 4. Return user info to frontend --- ### Vault A per-user directory tree containing Markdown notes. **Attributes**: - `user_id` (string, FK): Owner of the vault - `root_path` (string): Absolute filesystem path to vault root (e.g., `/data/vaults/alice/`) - `note_count` (int): Cached count of notes in vault (denormalized from index) **Constraints**: - Max 5,000 notes per vault (enforced by FR-008) - Root path must exist and be writable - Directory structure is arbitrary (user-defined nested folders) **Filesystem layout example**: ``` /data/vaults/alice/ ├── README.md ├── api/ │ ├── design.md │ └── endpoints.md ├── guides/ │ ├── setup.md │ └── deployment.md └── notes/ └── meeting-2025-01-15.md ``` --- ### Note A Markdown file with optional YAML frontmatter and body content. **Attributes**: - `user_id` (string, FK): Owner of the note - `note_path` (string, PK): Relative path to vault root, includes `.md` (e.g., `api/design.md`) - `version` (int): Optimistic concurrency version counter (starts at 1, increments on write) - `title` (string): Display title (from frontmatter, first H1, or filename stem) - `metadata` (JSON): Frontmatter key-value pairs (excludes auto-managed fields) - `body` (string): Markdown content (excluding frontmatter) - `created` (datetime, ISO 8601): Creation timestamp (auto-set if not in frontmatter) - `updated` (datetime, ISO 8601): Last modification timestamp (auto-set on every write) - `size_bytes` (int): UTF-8 byte size of full file content (frontmatter + body) **Constraints**: - `note_path` max 256 characters, Unix-style separators (`/`), no `..` allowed - `size_bytes` max 1 MiB (1,048,576 bytes) per FR-007 - `version` stored in index, NOT in frontmatter - `created` and `updated` stored in index, MAY appear in frontmatter (frontmatter is source of truth on read) **Title resolution priority** (FR-006): 1. `metadata.get('title')` from frontmatter 2. First `# Heading` in body (H1 only) 3. Filename stem (e.g., `design.md` → "design") **Metadata fields** (common, but arbitrary): - `tags` (array of strings): Tag names for categorization - `project` (string): Project identifier - `created` (datetime): User-provided creation timestamp - `updated` (datetime): User-provided update timestamp - Custom fields allowed (JSON object) --- ## Index Entities ### Wikilink Represents a bidirectional link between two notes. **Attributes**: - `user_id` (string, FK): Owner of the notes - `source_path` (string, FK): Path of note containing the wikilink - `target_path` (string, nullable, FK): Resolved path of linked note (null if unresolved) - `link_text` (string): Original text from `[[link text]]` - `is_resolved` (bool): True if `target_path` is non-null, false if broken link **Extraction**: - Regex pattern: `\[\[([^\]]+)\]\]` - Extract all matches from note body on every write **Resolution algorithm** (FR-015, FR-016): 1. Normalize `link_text` to slug: lowercase, replace spaces/underscores with dash, strip non-alphanumeric 2. Match normalized slug against: - Normalized filename stems (e.g., `api-design.md` → "api-design") - Normalized frontmatter titles (e.g., `title: "API Design"` → "api-design") 3. If multiple matches: - Prefer same-folder match (e.g., `api/[[design]]` → `api/design.md` over `guides/design.md`) - Tiebreaker: lexicographically smallest path 4. If no match: `target_path = null`, `is_resolved = false` **Slug normalization function**: ```python import re def normalize_slug(text: str) -> str: text = text.lower() text = re.sub(r'[\s_]+', '-', text) # Spaces/underscores → dash text = re.sub(r'[^a-z0-9-]', '', text) # Keep alphanumeric + dash text = re.sub(r'-+', '-', text) # Collapse dashes return text.strip('-') ``` **Backlinks**: - To get backlinks for `note_path`, query: `WHERE target_path = note_path` - Backlinks are automatically updated when any note's wikilinks change --- ### Tag A metadata label applied to notes for categorization. **Attributes**: - `user_id` (string, FK): Owner of the notes - `note_path` (string, FK): Path of tagged note - `tag_name` (string): Tag identifier (lowercase, alphanumeric + hyphens) **Constraints**: - Many-to-many relationship: one note can have multiple tags, one tag can apply to multiple notes - Tag names normalized: lowercase, strip whitespace - Extracted from frontmatter `tags: [tag1, tag2]` array **Tag count**: - Computed via `COUNT(DISTINCT note_path) GROUP BY tag_name` - Used for tag cloud and filtering UI --- ### Index Health Tracks the state and freshness of per-user indices. **Attributes**: - `user_id` (string, PK): Owner of the index - `note_count` (int): Total number of notes indexed - `last_full_rebuild` (datetime, nullable, ISO 8601): Timestamp of last full index rebuild - `last_incremental_update` (datetime, nullable, ISO 8601): Timestamp of last incremental update (write/delete) **Usage**: - Displayed in UI as index health indicator - Used to detect stale indices (e.g., `note_count` mismatch with actual file count) - Manual rebuild sets `last_full_rebuild = now()` (FR-019) - Every write/delete sets `last_incremental_update = now()` (FR-018) --- ## Authentication Entities ### Token (JWT) A signed JSON Web Token used for API and MCP authentication. **JWT Claims** (payload): - `sub` (string): Subject (user_id) - `iat` (int): Issued at timestamp (Unix epoch) - `exp` (int): Expiration timestamp (Unix epoch, iat + 90 days) **Header**: - `alg: "HS256"`: HMAC SHA-256 signature algorithm - `typ: "JWT"`: Token type **Signature**: - Signed with server secret (env var `JWT_SECRET_KEY`) - Validated on every API/MCP request via `Authorization: Bearer ` header **Token lifecycle**: 1. User authenticates via HF OAuth 2. User calls `POST /api/tokens` to issue JWT 3. Frontend stores token in memory (React context) 4. MCP clients pass token to `auth` parameter (FastMCP HTTP transport) 5. Server validates token on every request, extracts `user_id` from `sub` claim 6. Token expires after 90 days, user must re-authenticate **Example token**: ```json { "header": { "alg": "HS256", "typ": "JWT" }, "payload": { "sub": "alice", "iat": 1736956800, "exp": 1744732800 }, "signature": "" } ``` Encoded: `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsImlhdCI6MTczNjk1NjgwMCwiZXhwIjoxNzQ0NzMyODAwfQ.` --- ## SQLite Schema Complete DDL for multi-index storage with per-user isolation. ### Core Tables #### note_metadata Stores note metadata for fast lookups and version tracking. ```sql CREATE TABLE IF NOT EXISTS note_metadata ( user_id TEXT NOT NULL, note_path TEXT NOT NULL, version INTEGER NOT NULL DEFAULT 1, title TEXT NOT NULL, created TEXT NOT NULL, -- ISO 8601 timestamp updated TEXT NOT NULL, -- ISO 8601 timestamp size_bytes INTEGER NOT NULL DEFAULT 0, normalized_title_slug TEXT, -- Pre-computed for wikilink resolution normalized_path_slug TEXT, -- Pre-computed for wikilink resolution PRIMARY KEY (user_id, note_path) ); CREATE INDEX idx_metadata_user ON note_metadata(user_id); CREATE INDEX idx_metadata_updated ON note_metadata(user_id, updated DESC); CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug); CREATE INDEX idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug); ``` **Notes**: - Composite primary key: `(user_id, note_path)` - `version` starts at 1, increments on every write - `normalized_*_slug` columns enable O(1) wikilink resolution - Index on `updated DESC` for recency-based sorting --- #### note_fts Full-text search index using SQLite FTS5. ```sql CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5( user_id UNINDEXED, note_path UNINDEXED, title, body, content='', -- Contentless (external content pattern) tokenize='porter unicode61', -- Stemming + Unicode support prefix='2 3' -- Prefix indexes for autocomplete ); ``` **Notes**: - `content=''` (contentless): We manually INSERT/DELETE rows, no automatic sync - `UNINDEXED` columns are retrievable but not searchable (used for IDs) - `porter` tokenizer: English stemming (e.g., "running" matches "run") - `prefix='2 3'`: Enables fast `MATCH 'prefix*'` queries (2-char and 3-char prefixes) - Manual row management: On write, `DELETE` old row + `INSERT` new row - Ranking: Use `bm25(note_fts, 3.0, 1.0)` for title weight=3x, body weight=1x **Query example**: ```sql SELECT note_path, title, bm25(note_fts, 3.0, 1.0) AS rank FROM note_fts WHERE user_id = ? AND note_fts MATCH ? ORDER BY rank DESC LIMIT 50; ``` **Safety**: Incoming queries are tokenized into alphanumeric terms (per requirement to split on non-alphanumeric characters), each optionally preserving a trailing `*` for prefix searches, then wrapped in double quotes before being passed to `MATCH`. This neutralizes MATCH operators, trims punctuation such as apostrophes/ampersands, and prevents SQL syntax errors while preserving simple keyword semantics. --- #### note_tags Many-to-many junction table for note-tag relationships. ```sql CREATE TABLE IF NOT EXISTS note_tags ( user_id TEXT NOT NULL, note_path TEXT NOT NULL, tag TEXT NOT NULL, PRIMARY KEY (user_id, note_path, tag) ); CREATE INDEX idx_tags_user_tag ON note_tags(user_id, tag); CREATE INDEX idx_tags_user_path ON note_tags(user_id, note_path); ``` **Notes**: - Composite primary key prevents duplicate tag assignments - Index on `(user_id, tag)` for "all notes with tag X" queries - Index on `(user_id, note_path)` for "all tags for note Y" queries **Query examples**: ```sql -- Get all notes with tag "backend" SELECT DISTINCT note_path, title FROM note_tags t JOIN note_metadata m USING (user_id, note_path) WHERE t.user_id = ? AND t.tag = ? ORDER BY m.updated DESC; -- Get tag counts for user SELECT tag, COUNT(DISTINCT note_path) as count FROM note_tags WHERE user_id = ? GROUP BY tag ORDER BY count DESC; ``` --- #### note_links Stores wikilink graph for backlink navigation. ```sql CREATE TABLE IF NOT EXISTS note_links ( user_id TEXT NOT NULL, source_path TEXT NOT NULL, target_path TEXT, -- NULL if unresolved link_text TEXT NOT NULL, is_resolved INTEGER NOT NULL DEFAULT 0, -- Boolean: 0=broken, 1=resolved PRIMARY KEY (user_id, source_path, link_text) ); CREATE INDEX idx_links_user_source ON note_links(user_id, source_path); CREATE INDEX idx_links_user_target ON note_links(user_id, target_path); CREATE INDEX idx_links_unresolved ON note_links(user_id, is_resolved); ``` **Notes**: - `target_path` is nullable (null = broken link) - `is_resolved` is integer (0 or 1) for SQLite boolean representation - Composite primary key prevents duplicate links from same source with same text - Index on `target_path` enables fast backlink queries **Query examples**: ```sql -- Get backlinks for a note SELECT DISTINCT l.source_path, m.title FROM note_links l JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path WHERE l.user_id = ? AND l.target_path = ? ORDER BY m.updated DESC; -- Get all unresolved links for user SELECT source_path, link_text FROM note_links WHERE user_id = ? AND is_resolved = 0; ``` --- #### index_health Tracks index state and freshness per user. ```sql CREATE TABLE IF NOT EXISTS index_health ( user_id TEXT PRIMARY KEY, note_count INTEGER NOT NULL DEFAULT 0, last_full_rebuild TEXT, -- ISO 8601 timestamp last_incremental_update TEXT -- ISO 8601 timestamp ); ``` **Notes**: - One row per user - `last_full_rebuild` set on manual rebuild (FR-042) - `last_incremental_update` set on every write/delete (FR-018) - `note_count` is denormalized cache for quick health checks --- ### Initialization Script Complete schema initialization: ```sql -- Enable FTS5 extension (usually built-in) -- PRAGMA compile_options; -- Check if FTS5 is available BEGIN TRANSACTION; -- Core metadata table CREATE TABLE IF NOT EXISTS note_metadata ( user_id TEXT NOT NULL, note_path TEXT NOT NULL, version INTEGER NOT NULL DEFAULT 1, title TEXT NOT NULL, created TEXT NOT NULL, updated TEXT NOT NULL, size_bytes INTEGER NOT NULL DEFAULT 0, normalized_title_slug TEXT, normalized_path_slug TEXT, PRIMARY KEY (user_id, note_path) ); CREATE INDEX IF NOT EXISTS idx_metadata_user ON note_metadata(user_id); CREATE INDEX IF NOT EXISTS idx_metadata_updated ON note_metadata(user_id, updated DESC); CREATE INDEX IF NOT EXISTS idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug); CREATE INDEX IF NOT EXISTS idx_metadata_path_slug ON note_metadata(user_id, normalized_path_slug); -- Full-text search index CREATE VIRTUAL TABLE IF NOT EXISTS note_fts USING fts5( user_id UNINDEXED, note_path UNINDEXED, title, body, content='', tokenize='porter unicode61', prefix='2 3' ); -- Tag index CREATE TABLE IF NOT EXISTS note_tags ( user_id TEXT NOT NULL, note_path TEXT NOT NULL, tag TEXT NOT NULL, PRIMARY KEY (user_id, note_path, tag) ); CREATE INDEX IF NOT EXISTS idx_tags_user_tag ON note_tags(user_id, tag); CREATE INDEX IF NOT EXISTS idx_tags_user_path ON note_tags(user_id, note_path); -- Link graph CREATE TABLE IF NOT EXISTS note_links ( user_id TEXT NOT NULL, source_path TEXT NOT NULL, target_path TEXT, link_text TEXT NOT NULL, is_resolved INTEGER NOT NULL DEFAULT 0, PRIMARY KEY (user_id, source_path, link_text) ); CREATE INDEX IF NOT EXISTS idx_links_user_source ON note_links(user_id, source_path); CREATE INDEX IF NOT EXISTS idx_links_user_target ON note_links(user_id, target_path); CREATE INDEX IF NOT EXISTS idx_links_unresolved ON note_links(user_id, is_resolved); -- Index health tracking CREATE TABLE IF NOT EXISTS index_health ( user_id TEXT PRIMARY KEY, note_count INTEGER NOT NULL DEFAULT 0, last_full_rebuild TEXT, last_incremental_update TEXT ); COMMIT; ``` --- ## Pydantic Models Python data models using Pydantic for validation and serialization. ### User Models ```python from pydantic import BaseModel, Field from datetime import datetime from typing import Optional class HFProfile(BaseModel): """HuggingFace OAuth profile information.""" username: str = Field(..., description="HF username") name: Optional[str] = Field(None, description="Display name") avatar_url: Optional[str] = Field(None, description="Profile picture URL") class User(BaseModel): """User account with authentication info.""" user_id: str = Field(..., min_length=1, max_length=64, description="Internal user ID") hf_profile: Optional[HFProfile] = Field(None, description="HF OAuth profile") vault_path: str = Field(..., description="Absolute path to user's vault") created: datetime = Field(..., description="Account creation timestamp") class Config: json_schema_extra = { "example": { "user_id": "alice", "hf_profile": { "username": "alice", "name": "Alice Smith", "avatar_url": "/static-proxy?url=https%3A%2F%2Fcdn-avatars.huggingface.co%2Fv1%2Falice" }, "vault_path": "/data/vaults/alice", "created": "2025-01-15T10:30:00Z" } } ``` --- ### Note Models ```python from pathlib import Path import re class NoteMetadata(BaseModel): """Frontmatter metadata (arbitrary key-value pairs).""" title: Optional[str] = None tags: Optional[list[str]] = None project: Optional[str] = None created: Optional[datetime] = None updated: Optional[datetime] = None class Config: extra = "allow" # Allow arbitrary fields class Note(BaseModel): """Complete note with content and metadata.""" user_id: str = Field(..., description="Owner user ID") note_path: str = Field( ..., min_length=1, max_length=256, description="Relative path to vault root (includes .md)" ) version: int = Field(..., ge=1, description="Optimistic concurrency version") title: str = Field(..., min_length=1, description="Display title") metadata: NoteMetadata = Field(default_factory=NoteMetadata, description="Frontmatter") body: str = Field(..., description="Markdown content") created: datetime = Field(..., description="Creation timestamp") updated: datetime = Field(..., description="Last update timestamp") size_bytes: int = Field(..., ge=0, le=1_048_576, description="File size in bytes") @validator("note_path") def validate_path(cls, v): """Validate note path format.""" # Must end with .md if not v.endswith('.md'): raise ValueError("Note path must end with .md") # Must not contain .. if '..' in v: raise ValueError("Note path must not contain '..'") # Must use Unix-style separators if '\\' in v: raise ValueError("Note path must use Unix-style separators (/)") # Must not start with / if v.startswith('/'): raise ValueError("Note path must be relative (no leading /)") return v class Config: json_schema_extra = { "example": { "user_id": "alice", "note_path": "api/design.md", "version": 5, "title": "API Design", "metadata": { "tags": ["backend", "api"], "project": "auth-service" }, "body": "# API Design\n\nThis document describes...", "created": "2025-01-10T09:00:00Z", "updated": "2025-01-15T14:30:00Z", "size_bytes": 4096 } } class NoteCreate(BaseModel): """Request to create a new note.""" note_path: str = Field(..., min_length=1, max_length=256) title: Optional[str] = None metadata: Optional[NoteMetadata] = None body: str = Field(..., max_length=1_048_576) class NoteUpdate(BaseModel): """Request to update an existing note.""" title: Optional[str] = None metadata: Optional[NoteMetadata] = None body: str = Field(..., max_length=1_048_576) if_version: Optional[int] = Field(None, ge=1, description="Expected version for concurrency check") class NoteSummary(BaseModel): """Lightweight note summary for listings.""" note_path: str title: str updated: datetime ``` --- ### Index Models ```python class Wikilink(BaseModel): """Bidirectional link between notes.""" user_id: str source_path: str target_path: Optional[str] = Field(None, description="Null if unresolved") link_text: str is_resolved: bool class Config: json_schema_extra = { "example": { "user_id": "alice", "source_path": "api/design.md", "target_path": "api/endpoints.md", "link_text": "Endpoints", "is_resolved": True } } class Tag(BaseModel): """Tag with note count.""" tag_name: str count: int = Field(..., ge=0) class IndexHealth(BaseModel): """Index state and freshness metrics.""" user_id: str note_count: int = Field(..., ge=0) last_full_rebuild: Optional[datetime] = None last_incremental_update: Optional[datetime] = None class Config: json_schema_extra = { "example": { "user_id": "alice", "note_count": 142, "last_full_rebuild": "2025-01-01T00:00:00Z", "last_incremental_update": "2025-01-15T14:30:00Z" } } ``` --- ### Search Models ```python class SearchResult(BaseModel): """Full-text search result with snippet.""" note_path: str title: str snippet: str = Field(..., description="Highlighted excerpt from body") score: float = Field(..., description="Relevance score (title 3x, body 1x, recency bonus)") updated: datetime class SearchRequest(BaseModel): """Full-text search query.""" query: str = Field(..., min_length=1, max_length=256) limit: int = Field(50, ge=1, le=100) ``` --- ### Authentication Models ```python class TokenResponse(BaseModel): """JWT token issuance response.""" token: str = Field(..., description="JWT access token") token_type: str = Field("bearer", description="Token type") expires_at: datetime = Field(..., description="Expiration timestamp") class JWTPayload(BaseModel): """JWT claims payload.""" sub: str = Field(..., description="Subject (user_id)") iat: int = Field(..., description="Issued at (Unix timestamp)") exp: int = Field(..., description="Expiration (Unix timestamp)") ``` --- ## TypeScript Type Definitions Frontend type definitions for API contracts. ### Core Types ```typescript /** * User account with HF profile */ export interface User { user_id: string; hf_profile?: { username: string; name?: string; avatar_url?: string; }; vault_path: string; created: string; // ISO 8601 } /** * Note metadata (frontmatter) */ export interface NoteMetadata { title?: string; tags?: string[]; project?: string; created?: string; // ISO 8601 updated?: string; // ISO 8601 [key: string]: unknown; // Arbitrary fields } /** * Complete note with content */ export interface Note { user_id: string; note_path: string; version: number; title: string; metadata: NoteMetadata; body: string; created: string; // ISO 8601 updated: string; // ISO 8601 size_bytes: number; } /** * Lightweight note summary for listings */ export interface NoteSummary { note_path: string; title: string; updated: string; // ISO 8601 } /** * Request to create a note */ export interface NoteCreateRequest { note_path: string; title?: string; metadata?: NoteMetadata; body: string; } /** * Request to update a note */ export interface NoteUpdateRequest { title?: string; metadata?: NoteMetadata; body: string; if_version?: number; // Optimistic concurrency } /** * Wikilink with resolution status */ export interface Wikilink { user_id: string; source_path: string; target_path: string | null; // Null if unresolved link_text: string; is_resolved: boolean; } /** * Tag with note count */ export interface Tag { tag_name: string; count: number; } /** * Index health metrics */ export interface IndexHealth { user_id: string; note_count: number; last_full_rebuild: string | null; // ISO 8601 last_incremental_update: string | null; // ISO 8601 } /** * Search result with snippet */ export interface SearchResult { note_path: string; title: string; snippet: string; score: number; updated: string; // ISO 8601 } /** * JWT token response */ export interface TokenResponse { token: string; token_type: "bearer"; expires_at: string; // ISO 8601 } /** * API error response */ export interface APIError { error: string; message: string; detail?: Record; } ``` --- ### Validation Helpers ```typescript /** * Validate note path format */ export function isValidNotePath(path: string): boolean { return ( path.length > 0 && path.length <= 256 && path.endsWith('.md') && !path.includes('..') && !path.includes('\\') && !path.startsWith('/') ); } /** * Normalize tag name (lowercase, trim) */ export function normalizeTag(tag: string): string { return tag.toLowerCase().trim(); } /** * Normalize slug for wikilink resolution */ export function normalizeSlug(text: string): string { return text .toLowerCase() .replace(/[\s_]+/g, '-') // Spaces/underscores → dash .replace(/[^a-z0-9-]/g, '') // Keep alphanumeric + dash .replace(/-+/g, '-') // Collapse dashes .replace(/^-+|-+$/g, ''); // Trim dashes } /** * Extract wikilinks from markdown body */ export function extractWikilinks(markdown: string): string[] { const pattern = /\[\[([^\]]+)\]\]/g; const matches: string[] = []; let match; while ((match = pattern.exec(markdown)) !== null) { matches.push(match[1]); } return matches; } ``` --- ## Validation Rules Comprehensive validation constraints for all entities. ### Note Path Validation ```python import re from pathlib import Path def validate_note_path(path: str) -> tuple[bool, str]: """ Validate note path format. Returns (is_valid, error_message). """ # Length check if not path or len(path) > 256: return False, "Path must be 1-256 characters" # Must end with .md if not path.endswith('.md'): return False, "Path must end with .md" # Must not contain .. if '..' in path: return False, "Path must not contain '..'" # Must use Unix-style separators if '\\' in path: return False, "Path must use Unix separators (/)" # Must be relative if path.startswith('/'): return False, "Path must be relative (no leading /)" # Must not have invalid characters invalid_chars = ['<', '>', ':', '"', '|', '?', '*'] if any(c in path for c in invalid_chars): return False, f"Path contains invalid characters: {invalid_chars}" return True, "" def sanitize_path(user_id: str, vault_root: str, note_path: str) -> Path: """ Sanitize and resolve note path within vault. Raises ValueError if path escapes vault root. """ vault = Path(vault_root) / user_id full_path = (vault / note_path).resolve() # Ensure path is within vault if not str(full_path).startswith(str(vault.resolve())): raise ValueError(f"Path escapes vault root: {note_path}") return full_path ``` --- ### Note Content Validation ```python def validate_note_content(body: str) -> tuple[bool, str]: """ Validate note content. Returns (is_valid, error_message). """ # Size check (1 MiB max) size_bytes = len(body.encode('utf-8')) if size_bytes > 1_048_576: return False, f"Note exceeds 1 MiB limit ({size_bytes} bytes)" # UTF-8 validity try: body.encode('utf-8') except UnicodeEncodeError as e: return False, f"Invalid UTF-8 encoding: {e}" return True, "" def validate_frontmatter(metadata: dict) -> tuple[bool, str]: """ Validate frontmatter metadata. Returns (is_valid, error_message). """ # Check for reserved fields reserved = ['version'] # Version is managed by index, not frontmatter for key in metadata.keys(): if key in reserved: return False, f"Field '{key}' is reserved and cannot be set in frontmatter" # Validate tags format if 'tags' in metadata: tags = metadata['tags'] if not isinstance(tags, list): return False, "Field 'tags' must be an array" if not all(isinstance(t, str) for t in tags): return False, "All tags must be strings" return True, "" ``` --- ### Vault Limits ```python def check_vault_limit(user_id: str, db) -> tuple[bool, str]: """ Check if vault is within note limit. Returns (is_allowed, error_message). """ cursor = db.execute( "SELECT note_count FROM index_health WHERE user_id = ?", (user_id,) ) row = cursor.fetchone() if row is None: return True, "" # New vault, no limit yet note_count = row[0] if note_count >= 5000: return False, "Vault note limit exceeded (max 5,000 notes)" return True, "" ``` --- ### Token Validation ```python import jwt from datetime import datetime, timedelta SECRET_KEY = "your-secret-key" # From env var def create_jwt(user_id: str) -> str: """Create JWT with 90-day expiration.""" now = datetime.utcnow() payload = { "sub": user_id, "iat": int(now.timestamp()), "exp": int((now + timedelta(days=90)).timestamp()) } return jwt.encode(payload, SECRET_KEY, algorithm="HS256") def validate_jwt(token: str) -> tuple[bool, str, str]: """ Validate JWT and extract user_id. Returns (is_valid, user_id, error_message). """ try: payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"]) user_id = payload["sub"] return True, user_id, "" except jwt.ExpiredSignatureError: return False, "", "Token expired" except jwt.InvalidTokenError as e: return False, "", f"Invalid token: {e}" ``` --- ## State Transitions State machines for entity lifecycle and version management. ### Note Lifecycle ```mermaid stateDiagram-v2 [*] --> Creating: write_note (new path) Creating --> Active: save to filesystem + insert metadata Active --> Updating: write_note (existing path) Updating --> Active: increment version + update index Active --> Deleting: delete_note Deleting --> [*]: remove file + delete index rows Updating --> ConflictDetected: if_version mismatch (UI writes only) ConflictDetected --> Active: reload or discard changes ``` **State descriptions**: 1. **Creating**: Note does not exist, first write in progress - Validate path and content - Set `version = 1`, `created = now()`, `updated = now()` - Write file to filesystem - Insert rows into `note_metadata`, `note_fts`, `note_tags`, `note_links` 2. **Active**: Note exists and is readable/editable - Can be read via API/MCP - Can be updated via write_note - Can be deleted via delete_note 3. **Updating**: Modification in progress - Load current metadata (version, timestamps) - If UI write: check `if_version` matches current version - If version mismatch: transition to ConflictDetected - If MCP write: skip version check (last-write-wins) - Increment `version`, set `updated = now()` - Update file content - Update all index rows (delete old, insert new) 4. **ConflictDetected**: Optimistic concurrency conflict (UI only) - Return `409 Conflict` with current and expected versions - UI displays error: "Note changed since you opened it" - User options: reload, save as copy, or discard changes 5. **Deleting**: Removal in progress - Delete file from filesystem - Delete rows from `note_metadata`, `note_fts`, `note_tags`, `note_links` - Update backlinks (any note linking to deleted note now has unresolved link) - Decrement `index_health.note_count` --- ### Version Increment Logic ```python def increment_version(user_id: str, note_path: str, if_version: int | None, db) -> int: """ Increment note version with optional concurrency check. Returns new version number. Raises ConflictError if if_version doesn't match. """ # Get current version cursor = db.execute( "SELECT version FROM note_metadata WHERE user_id = ? AND note_path = ?", (user_id, note_path) ) row = cursor.fetchone() if row is None: # New note return 1 current_version = row[0] # Optimistic concurrency check (UI writes only) if if_version is not None and current_version != if_version: raise ConflictError( f"Version conflict: expected {if_version}, current is {current_version}" ) # Increment version new_version = current_version + 1 return new_version ``` --- ### Index Update Workflow ```mermaid stateDiagram-v2 [*] --> IncrementalUpdate: write_note or delete_note IncrementalUpdate --> DeleteOldRows: start transaction DeleteOldRows --> ExtractMetadata: parse frontmatter + body ExtractMetadata --> InsertNewRows: insert into all index tables InsertNewRows --> UpdateHealth: set last_incremental_update UpdateHealth --> [*]: commit transaction [*] --> FullRebuild: POST /api/index/rebuild FullRebuild --> DropUserRows: delete all rows for user_id DropUserRows --> ScanVault: walk filesystem tree ScanVault --> ProcessNote: for each .md file ProcessNote --> ExtractMetadata ProcessNote --> ScanVault: next file ScanVault --> UpdateHealth: set last_full_rebuild UpdateHealth --> [*]: commit transaction ``` **Incremental update** (on every write/delete): 1. Start SQLite transaction 2. Delete all existing rows for `(user_id, note_path)` from: - `note_metadata` - `note_fts` - `note_tags` - `note_links` 3. Parse note content (frontmatter + body) 4. Extract: title, tags, wikilinks 5. Insert new rows into all index tables 6. Resolve wikilinks and update `is_resolved` flags 7. Update `index_health.last_incremental_update = now()` 8. Commit transaction **Full rebuild** (manual trigger): 1. Start SQLite transaction 2. Delete all rows for `user_id` from all index tables 3. Walk vault directory tree, find all `.md` files 4. For each file: - Parse frontmatter + body - Extract metadata, tags, wikilinks - Insert rows into all index tables 5. Resolve all wikilinks (second pass after all notes indexed) 6. Update `index_health.note_count` and `last_full_rebuild = now()` 7. Commit transaction --- ## Relationships and Constraints ### Foreign Key Relationships While SQLite supports foreign keys, we don't enforce them for performance reasons (multi-tenant with user-scoped queries). Instead, we rely on application-level referential integrity. **Logical relationships**: - `note_metadata.user_id` → `User.user_id` - `note_tags.note_path` → `note_metadata.note_path` - `note_links.source_path` → `note_metadata.note_path` - `note_links.target_path` → `note_metadata.note_path` (nullable) **Cascade semantics** (application-enforced): - On delete note: cascade delete from `note_tags`, `note_links` (source), `note_fts` - On delete note: update `note_links` (target) to set `is_resolved = false` --- ### Uniqueness Constraints | Table | Unique Constraint | Enforced By | |-------|------------------|-------------| | `note_metadata` | `(user_id, note_path)` | PRIMARY KEY | | `note_tags` | `(user_id, note_path, tag)` | PRIMARY KEY | | `note_links` | `(user_id, source_path, link_text)` | PRIMARY KEY | | `index_health` | `user_id` | PRIMARY KEY | --- ### Cardinality | Relationship | Type | Notes | |-------------|------|-------| | User → Vault | 1:1 | One user owns one vault | | Vault → Notes | 1:N | One vault contains many notes (max 5,000) | | Note → Tags | N:M | Many-to-many via `note_tags` junction table | | Note → Outgoing Links | 1:N | One note has many outgoing wikilinks | | Note → Backlinks | 1:N | One note may be referenced by many backlinks | | User → Tokens | 1:N | One user can issue multiple JWT tokens | --- ### Invariants Critical invariants maintained by the system: 1. **Version monotonicity**: `note.version` only increases (never decreases or resets) 2. **Timestamp ordering**: `note.created <= note.updated` always 3. **Path uniqueness**: No two notes with same `(user_id, note_path)` can exist 4. **Size limit**: `note.size_bytes <= 1_048_576` always enforced 5. **Vault limit**: `COUNT(*) WHERE user_id = X <= 5000` enforced before writes 6. **Link consistency**: If `note_links.target_path` is not null, target note must exist 7. **Tag normalization**: All `note_tags.tag` values are lowercase 8. **Index freshness**: `index_health.last_incremental_update` is always >= most recent `note_metadata.updated` for that user --- ## Appendix ### Common Queries Reference ```sql -- Get all notes for user, sorted by recent update SELECT note_path, title, updated FROM note_metadata WHERE user_id = ? ORDER BY updated DESC LIMIT 100; -- Full-text search with title boost SELECT note_path, title, snippet(note_fts, 3, '', '', '...', 32) AS snippet, bm25(note_fts, 3.0, 1.0) AS score FROM note_fts WHERE user_id = ? AND note_fts MATCH ? ORDER BY score DESC LIMIT 50; -- Get all tags with counts SELECT tag, COUNT(DISTINCT note_path) as count FROM note_tags WHERE user_id = ? GROUP BY tag ORDER BY count DESC; -- Get backlinks for a note SELECT DISTINCT l.source_path, m.title FROM note_links l JOIN note_metadata m ON l.user_id = m.user_id AND l.source_path = m.note_path WHERE l.user_id = ? AND l.target_path = ? ORDER BY m.updated DESC; -- Get all unresolved wikilinks for a user SELECT source_path, link_text, COUNT(*) as occurrences FROM note_links WHERE user_id = ? AND is_resolved = 0 GROUP BY source_path, link_text ORDER BY occurrences DESC; -- Check index health SELECT note_count, last_full_rebuild, last_incremental_update FROM index_health WHERE user_id = ?; ``` --- ### Migration Strategy For future schema changes: ```python # Example migration: Add column to note_metadata def migrate_v1_to_v2(db): """Add normalized_title_slug column.""" db.execute(""" ALTER TABLE note_metadata ADD COLUMN normalized_title_slug TEXT; """) # Backfill existing notes db.execute(""" UPDATE note_metadata SET normalized_title_slug = LOWER( REPLACE(REPLACE(title, ' ', '-'), '_', '-') ); """) db.execute(""" CREATE INDEX idx_metadata_title_slug ON note_metadata(user_id, normalized_title_slug); """) db.commit() ``` --- **Document Status**: Draft **Last Updated**: 2025-11-15 **Next Review**: After Phase 1 implementation