# State Persistence Bug - Analysis and Fix ## Date October 1, 2025 ## Overview Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast). ## Problem Statement (Bug 2.4.4) ### User Story **As a user**, when I: 1. Load an audio file and transcribe it 2. Edit/detect speaker names (e.g., "Alice", "Bob") 3. Generate summary and title 4. Load a DIFFERENT audio file (upload, YouTube, podcast) 5. **Expected:** Clean slate - no speaker names, no summary, no title 6. **Actual:** Previous speaker names, summary, and title still visible ### Visual Example **Scenario:** ``` Step 1: Load "podcast_interview.mp3" - Transcribe → 2 speakers detected - Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob" - Generate summary: "Interview about AI..." - Title: "AI Discussion with Alice" Step 2: Load "meeting_recording.mp3" (different audio) - Audio player shows new file ✓ - Transcript: EMPTY (not yet transcribed) ✓ - Speaker names: Still shows "Alice", "Bob" from previous file ✗ - Summary: Still shows "Interview about AI..." ✗ - Title: Still shows "AI Discussion with Alice" ✗ Step 3: Transcribe new audio - New transcript appears with 3 speakers - Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) ✗ - Summary: Still old summary ✗ ``` ### Impact - **Confusion:** Users see speaker names from different audio files - **Data Integrity:** Mixed data from multiple sessions - **Trust Issue:** Users can't trust the displayed information - **UX Problem:** Must manually clear/reset before each new file ## Root Cause Analysis ### Current State Management **State Object:** ```javascript const state = { config: { moonshine: {}, sensevoice: {}, llms: {} }, backend: 'sensevoice', utterances: [], diarizedUtterances: null, diarizationStats: null, speakerNames: {}, // ❌ NOT reset when source changes summary: '', // ❌ NOT reset when source changes title: '', // ❌ NOT reset when source changes audioUrl: null, sourcePath: null, uploadedFile: null, transcribing: false, summarizing: false, detectingSpeakerNames: false, transcriptionController: null, summaryController: null, }; ``` ### Existing Reset Function **Location:** `frontend/app.js:resetTranscriptionState()` (lines 265-273) ```javascript function resetTranscriptionState() { state.utterances = []; state.diarizedUtterances = null; state.diarizationStats = null; activeUtteranceIndex = -1; elements.transcriptList.innerHTML = ''; elements.utteranceCount.textContent = ''; elements.diarizationPanel.classList.add('hidden'); // ❌ MISSING: state.speakerNames = {}; // ❌ MISSING: state.summary = ''; // ❌ MISSING: state.title = ''; // ❌ MISSING: Clear summary/title UI elements } ``` **Called only by:** `handleTranscription()` (line 302) ### Source Change Functions #### Function 1: `handleFileUpload()` (lines 1119-1127) ```javascript function handleFileUpload(event) { const file = event.target.files?.[0]; if (!file) return; state.uploadedFile = file; state.audioUrl = null; const objectUrl = URL.createObjectURL(file); elements.audioPlayer.src = objectUrl; setStatus(`Loaded ${file.name}`, 'info'); // ❌ MISSING: No call to reset state } ``` #### Function 2: `handleYoutubeFetch()` (lines 1129-1147) ```javascript async function handleYoutubeFetch() { // ... fetch logic ... state.audioUrl = data.audioUrl; state.uploadedFile = null; elements.audioPlayer.src = data.audioUrl; setStatus('YouTube audio ready', 'success'); // ❌ MISSING: No call to reset state } ``` #### Function 3: `downloadEpisode()` (lines 1226-1258) ```javascript async function downloadEpisode(audioUrl, title, triggerButton = null) { // ... download logic ... state.audioUrl = data.audioUrl; state.uploadedFile = null; elements.audioPlayer.src = data.audioUrl; setStatus('Episode ready', 'success'); // ❌ MISSING: No call to reset state } ``` ### Why It Happens **Problem Flow:** ``` 1. User loads Audio A → state.speakerNames, summary, title are empty 2. User transcribes Audio A → resetTranscriptionState() called (clears transcript, but NOT speaker names) → Transcription creates new utterances → state.speakerNames gets populated 3. User edits speaker names, generates summary → state.speakerNames = { 0: "Alice", 1: "Bob" } → state.summary = "Interview..." → state.title = "AI Discussion" 4. User loads Audio B (via upload, YouTube, or podcast) → handleFileUpload/handleYoutubeFetch/downloadEpisode called → Audio player source changed ✓ → state.audioUrl/uploadedFile updated ✓ → BUT state.speakerNames, summary, title NOT cleared ✗ 5. User transcribes Audio B → resetTranscriptionState() called → Clears utterances, diarization stats ✓ → BUT does NOT clear speakerNames, summary, title ✗ → New transcription with old speaker names appears! ``` ## Solution Design ### Design Principles 1. **Complete Reset:** Clear ALL session-specific data when source changes 2. **Clear Intent:** Reset should happen immediately when new source loaded 3. **Separation of Concerns:** - Transcription reset: Clear transcription-related data - Session reset: Clear ALL session data including summary, title, speaker names 4. **Consistent Behavior:** Same reset logic for all source types (upload, YouTube, podcast) ### Two-Level Reset Strategy #### Level 1: Reset Transcription Data (Existing) **When:** Before starting new transcription **What:** Utterances, diarization stats, transcript UI ```javascript function resetTranscriptionState() { state.utterances = []; state.diarizedUtterances = null; state.diarizationStats = null; activeUtteranceIndex = -1; elements.transcriptList.innerHTML = ''; elements.utteranceCount.textContent = ''; elements.diarizationPanel.classList.add('hidden'); } ``` #### Level 2: Reset Complete Session (NEW) **When:** When new audio source is loaded **What:** Everything from Level 1 + speaker names + summary + title ```javascript function resetCompleteSession() { // Level 1: Reset transcription data resetTranscriptionState(); // Level 2: Reset speaker names state.speakerNames = {}; // Level 3: Reset summary and title state.summary = ''; state.title = ''; elements.summaryOutput.innerHTML = ''; elements.titleOutput.textContent = ''; // Level 4: Reset timeline segments renderTimelineSegments(); // Will be empty with no utterances // Optional: Hide detect speaker names button elements.detectSpeakerNamesBtn.classList.add('hidden'); } ``` ## Implementation ### Change 1: Create `resetCompleteSession()` Function **File:** `frontend/app.js` (after `resetTranscriptionState()`) ```javascript function resetCompleteSession() { // Reset transcription data resetTranscriptionState(); // Reset speaker names state.speakerNames = {}; // Reset summary and title state.summary = ''; state.title = ''; // Clear summary and title UI elements.summaryOutput.innerHTML = ''; elements.titleOutput.textContent = ''; // Reset timeline visualization renderTimelineSegments(); // Hide speaker name detection button elements.detectSpeakerNamesBtn.classList.add('hidden'); // Reset status setStatus('Ready for new transcription', 'info'); } ``` ### Change 2: Call Reset on File Upload **File:** `frontend/app.js:handleFileUpload()` (lines ~1119-1127) ```javascript function handleFileUpload(event) { const file = event.target.files?.[0]; if (!file) return; // Reset complete session when new file loaded resetCompleteSession(); state.uploadedFile = file; state.audioUrl = null; const objectUrl = URL.createObjectURL(file); elements.audioPlayer.src = objectUrl; setStatus(`Loaded ${file.name}`, 'info'); } ``` ### Change 3: Call Reset on YouTube Fetch **File:** `frontend/app.js:handleYoutubeFetch()` (lines ~1129-1147) ```javascript async function handleYoutubeFetch() { if (!elements.youtubeUrl.value.trim()) return; setStatus('Downloading audio from YouTube...', 'info'); try { const res = await fetch('/api/youtube/fetch', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }), }); if (!res.ok) throw new Error('YouTube download failed'); const data = await res.json(); // Reset complete session when new YouTube audio loaded resetCompleteSession(); state.audioUrl = data.audioUrl; state.uploadedFile = null; elements.audioPlayer.src = data.audioUrl; setStatus('YouTube audio ready', 'success'); } catch (err) { console.error(err); setStatus(err.message, 'error'); } } ``` ### Change 4: Call Reset on Podcast Episode Download **File:** `frontend/app.js:downloadEpisode()` (lines ~1226-1258) ```javascript async function downloadEpisode(audioUrl, title, triggerButton = null) { setStatus('Downloading episode...', 'info'); let originalLabel = null; if (triggerButton) { originalLabel = triggerButton.innerHTML; triggerButton.disabled = true; triggerButton.classList.add('loading'); triggerButton.textContent = 'Downloading…'; } try { const res = await fetch('/api/podcast/download', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ audioUrl, title }), }); if (!res.ok) throw new Error('Episode download failed'); const data = await res.json(); // Reset complete session when new episode loaded resetCompleteSession(); state.audioUrl = data.audioUrl; state.uploadedFile = null; elements.audioPlayer.src = data.audioUrl; setStatus('Episode ready', 'success'); // ... rest of the function } catch (err) { // ... error handling } } ``` ## Behavior After Fix ### Example Scenario **Step 1: Load and Process First Audio** ``` 1. Upload "interview.mp3" → resetCompleteSession() called → Clean slate: no utterances, speaker names, summary, title 2. Transcribe → resetTranscriptionState() called (redundant but harmless) → Transcript appears, 2 speakers detected 3. Edit speaker names → state.speakerNames = { 0: "Alice", 1: "Bob" } 4. Generate summary → state.summary = "Interview about AI..." → state.title = "AI Discussion" ``` **Step 2: Load Different Audio** ``` 1. Upload "meeting.mp3" → resetCompleteSession() called ✓ → state.speakerNames = {} (cleared) ✓ → state.summary = '' (cleared) ✓ → state.title = '' (cleared) ✓ → Summary UI cleared ✓ → Title UI cleared ✓ → Timeline cleared ✓ → Status: "Loaded meeting.mp3" 2. Transcribe → Fresh transcript with 3 speakers → Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" ✓ → No contamination from previous audio ✓ ``` **Step 3: Generate New Summary** ``` 1. Click "Generate Summary" → New summary generated for current audio ✓ → Replaces old summary (already cleared) ✓ → New title generated ✓ ``` ## Edge Cases ### Edge Case 1: Upload Same File Twice ``` 1. Upload "audio.mp3" → resetCompleteSession() called 2. Transcribe and edit 3. Upload same "audio.mp3" again → resetCompleteSession() called (data cleared) → User must transcribe again Decision: Acceptable - user explicitly chose to reload ``` ### Edge Case 2: Change Source During Transcription ``` 1. Start transcription of "audio1.mp3" 2. Mid-transcription, upload "audio2.mp3" → resetCompleteSession() called → Partial transcription cleared → New audio loaded Decision: Acceptable - user action indicates intent to switch Note: Transcription abort handling already exists ``` ### Edge Case 3: YouTube Fetch While Audio Playing ``` 1. Upload file, play audio 2. Fetch YouTube audio → resetCompleteSession() called → Audio player source changed → Playback stops (normal behavior) Decision: Acceptable - expected behavior when changing source ``` ### Edge Case 4: Multiple Podcast Episodes in Sequence ``` 1. Download episode 1 → resetCompleteSession() 2. Transcribe episode 1 3. Download episode 2 → resetCompleteSession() (episode 1 data cleared) 4. Transcribe episode 2 Decision: Correct behavior - each episode is independent ``` ## UI Elements to Reset ### Complete Checklist **State Variables:** - [x] `state.utterances` (via resetTranscriptionState) - [x] `state.diarizedUtterances` (via resetTranscriptionState) - [x] `state.diarizationStats` (via resetTranscriptionState) - [x] `state.speakerNames` (NEW) - [x] `state.summary` (NEW) - [x] `state.title` (NEW) - [x] `activeUtteranceIndex` (via resetTranscriptionState) **DOM Elements:** - [x] `elements.transcriptList` (via resetTranscriptionState) - [x] `elements.utteranceCount` (via resetTranscriptionState) - [x] `elements.diarizationPanel` (via resetTranscriptionState) - [x] `elements.diarizationMetrics` (via renderDiarizationStats after reset) - [x] `elements.speakerBreakdown` (via renderDiarizationStats after reset) - [x] `elements.summaryOutput` (NEW) - [x] `elements.titleOutput` (NEW) - [x] `elements.timelineSegments` (via renderTimelineSegments) - [x] `elements.detectSpeakerNamesBtn` visibility (NEW) ## Testing Scenarios ### ✅ Test 1: Upload → Edit → Upload New 1. Upload "audio1.mp3" 2. Transcribe, edit speaker names to "Alice", "Bob" 3. Generate summary "Summary 1" 4. Upload "audio2.mp3" 5. **Verify:** Speaker names cleared, summary cleared, title cleared 6. Transcribe 7. **Verify:** Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob) ### ✅ Test 2: YouTube → Summary → Podcast 1. Fetch YouTube audio 2. Transcribe, generate summary 3. Download podcast episode 4. **Verify:** YouTube summary cleared 5. Transcribe podcast 6. **Verify:** Independent transcript and summary ### ✅ Test 3: Podcast → Names → YouTube 1. Download podcast 2. Transcribe, detect speaker names 3. Fetch YouTube audio 4. **Verify:** Podcast speaker names cleared 5. Transcribe YouTube 6. **Verify:** No podcast names visible ### ✅ Test 4: Rapid Source Changes 1. Upload file 2. Immediately fetch YouTube (before transcription) 3. **Verify:** File data cleared, YouTube ready 4. Immediately download podcast 5. **Verify:** YouTube data cleared, podcast ready ### ✅ Test 5: Same Source Reload 1. Upload "audio.mp3", transcribe, edit 2. Upload same "audio.mp3" again 3. **Verify:** Previous edits cleared (fresh start) ### ✅ Test 6: Timeline Visualization 1. Upload audio, transcribe (timeline segments appear) 2. Upload different audio 3. **Verify:** Timeline segments cleared (empty) 4. Transcribe new audio 5. **Verify:** New timeline segments appear ## Performance Considerations - **resetCompleteSession():** O(1) - fast state/DOM clearing - **Called only on source change:** Infrequent user action - **Impact:** Negligible (<1ms) ## Backward Compatibility - ✅ Existing `resetTranscriptionState()` unchanged - ✅ New function adds capability, doesn't break existing code - ✅ No API changes required - ✅ No breaking changes to user workflow ## Implementation Checklist - [ ] Create `resetCompleteSession()` function - [ ] Update `handleFileUpload()` to call reset - [ ] Update `handleYoutubeFetch()` to call reset - [ ] Update `downloadEpisode()` to call reset - [ ] Test all source change scenarios - [ ] Verify UI elements cleared - [ ] Verify no data contamination between sessions - [ ] Update documentation - [ ] Commit changes ## Related Bugs - Bug 2.4.1: Manual speaker name propagation (Fixed) - Bug 2.4.2: Auto-detection UI update (Fixed) - Bug 2.4.3: Clear name to enable detection (Fixed) - Bug 2.4.4: State persistence across audio files (This bug) ## Files to Modify ### `/home/luigi/VoxSum/frontend/app.js` - **New Function:** `resetCompleteSession()` (after line 273) - **Modify:** `handleFileUpload()` (line ~1122) - **Modify:** `handleYoutubeFetch()` (line ~1141) - **Modify:** `downloadEpisode()` (line ~1239) - **Impact:** ~40 lines added/modified ## Conclusion The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive `resetCompleteSession()` function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.