| # State Persistence Bug - Analysis and Fix | |
| ## Date | |
| October 1, 2025 | |
| ## Overview | |
| Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast). | |
| ## Problem Statement (Bug 2.4.4) | |
| ### User Story | |
| **As a user**, when I: | |
| 1. Load an audio file and transcribe it | |
| 2. Edit/detect speaker names (e.g., "Alice", "Bob") | |
| 3. Generate summary and title | |
| 4. Load a DIFFERENT audio file (upload, YouTube, podcast) | |
| 5. **Expected:** Clean slate - no speaker names, no summary, no title | |
| 6. **Actual:** Previous speaker names, summary, and title still visible | |
| ### Visual Example | |
| **Scenario:** | |
| ``` | |
| Step 1: Load "podcast_interview.mp3" | |
| - Transcribe β 2 speakers detected | |
| - Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob" | |
| - Generate summary: "Interview about AI..." | |
| - Title: "AI Discussion with Alice" | |
| Step 2: Load "meeting_recording.mp3" (different audio) | |
| - Audio player shows new file β | |
| - Transcript: EMPTY (not yet transcribed) β | |
| - Speaker names: Still shows "Alice", "Bob" from previous file β | |
| - Summary: Still shows "Interview about AI..." β | |
| - Title: Still shows "AI Discussion with Alice" β | |
| Step 3: Transcribe new audio | |
| - New transcript appears with 3 speakers | |
| - Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) β | |
| - Summary: Still old summary β | |
| ``` | |
| ### Impact | |
| - **Confusion:** Users see speaker names from different audio files | |
| - **Data Integrity:** Mixed data from multiple sessions | |
| - **Trust Issue:** Users can't trust the displayed information | |
| - **UX Problem:** Must manually clear/reset before each new file | |
| ## Root Cause Analysis | |
| ### Current State Management | |
| **State Object:** | |
| ```javascript | |
| const state = { | |
| config: { moonshine: {}, sensevoice: {}, llms: {} }, | |
| backend: 'sensevoice', | |
| utterances: [], | |
| diarizedUtterances: null, | |
| diarizationStats: null, | |
| speakerNames: {}, // β NOT reset when source changes | |
| summary: '', // β NOT reset when source changes | |
| title: '', // β NOT reset when source changes | |
| audioUrl: null, | |
| sourcePath: null, | |
| uploadedFile: null, | |
| transcribing: false, | |
| summarizing: false, | |
| detectingSpeakerNames: false, | |
| transcriptionController: null, | |
| summaryController: null, | |
| }; | |
| ``` | |
| ### Existing Reset Function | |
| **Location:** `frontend/app.js:resetTranscriptionState()` (lines 265-273) | |
| ```javascript | |
| function resetTranscriptionState() { | |
| state.utterances = []; | |
| state.diarizedUtterances = null; | |
| state.diarizationStats = null; | |
| activeUtteranceIndex = -1; | |
| elements.transcriptList.innerHTML = ''; | |
| elements.utteranceCount.textContent = ''; | |
| elements.diarizationPanel.classList.add('hidden'); | |
| // β MISSING: state.speakerNames = {}; | |
| // β MISSING: state.summary = ''; | |
| // β MISSING: state.title = ''; | |
| // β MISSING: Clear summary/title UI elements | |
| } | |
| ``` | |
| **Called only by:** `handleTranscription()` (line 302) | |
| ### Source Change Functions | |
| #### Function 1: `handleFileUpload()` (lines 1119-1127) | |
| ```javascript | |
| function handleFileUpload(event) { | |
| const file = event.target.files?.[0]; | |
| if (!file) return; | |
| state.uploadedFile = file; | |
| state.audioUrl = null; | |
| const objectUrl = URL.createObjectURL(file); | |
| elements.audioPlayer.src = objectUrl; | |
| setStatus(`Loaded ${file.name}`, 'info'); | |
| // β MISSING: No call to reset state | |
| } | |
| ``` | |
| #### Function 2: `handleYoutubeFetch()` (lines 1129-1147) | |
| ```javascript | |
| async function handleYoutubeFetch() { | |
| // ... fetch logic ... | |
| state.audioUrl = data.audioUrl; | |
| state.uploadedFile = null; | |
| elements.audioPlayer.src = data.audioUrl; | |
| setStatus('YouTube audio ready', 'success'); | |
| // β MISSING: No call to reset state | |
| } | |
| ``` | |
| #### Function 3: `downloadEpisode()` (lines 1226-1258) | |
| ```javascript | |
| async function downloadEpisode(audioUrl, title, triggerButton = null) { | |
| // ... download logic ... | |
| state.audioUrl = data.audioUrl; | |
| state.uploadedFile = null; | |
| elements.audioPlayer.src = data.audioUrl; | |
| setStatus('Episode ready', 'success'); | |
| // β MISSING: No call to reset state | |
| } | |
| ``` | |
| ### Why It Happens | |
| **Problem Flow:** | |
| ``` | |
| 1. User loads Audio A | |
| β state.speakerNames, summary, title are empty | |
| 2. User transcribes Audio A | |
| β resetTranscriptionState() called (clears transcript, but NOT speaker names) | |
| β Transcription creates new utterances | |
| β state.speakerNames gets populated | |
| 3. User edits speaker names, generates summary | |
| β state.speakerNames = { 0: "Alice", 1: "Bob" } | |
| β state.summary = "Interview..." | |
| β state.title = "AI Discussion" | |
| 4. User loads Audio B (via upload, YouTube, or podcast) | |
| β handleFileUpload/handleYoutubeFetch/downloadEpisode called | |
| β Audio player source changed β | |
| β state.audioUrl/uploadedFile updated β | |
| β BUT state.speakerNames, summary, title NOT cleared β | |
| 5. User transcribes Audio B | |
| β resetTranscriptionState() called | |
| β Clears utterances, diarization stats β | |
| β BUT does NOT clear speakerNames, summary, title β | |
| β New transcription with old speaker names appears! | |
| ``` | |
| ## Solution Design | |
| ### Design Principles | |
| 1. **Complete Reset:** Clear ALL session-specific data when source changes | |
| 2. **Clear Intent:** Reset should happen immediately when new source loaded | |
| 3. **Separation of Concerns:** | |
| - Transcription reset: Clear transcription-related data | |
| - Session reset: Clear ALL session data including summary, title, speaker names | |
| 4. **Consistent Behavior:** Same reset logic for all source types (upload, YouTube, podcast) | |
| ### Two-Level Reset Strategy | |
| #### Level 1: Reset Transcription Data (Existing) | |
| **When:** Before starting new transcription | |
| **What:** Utterances, diarization stats, transcript UI | |
| ```javascript | |
| function resetTranscriptionState() { | |
| state.utterances = []; | |
| state.diarizedUtterances = null; | |
| state.diarizationStats = null; | |
| activeUtteranceIndex = -1; | |
| elements.transcriptList.innerHTML = ''; | |
| elements.utteranceCount.textContent = ''; | |
| elements.diarizationPanel.classList.add('hidden'); | |
| } | |
| ``` | |
| #### Level 2: Reset Complete Session (NEW) | |
| **When:** When new audio source is loaded | |
| **What:** Everything from Level 1 + speaker names + summary + title | |
| ```javascript | |
| function resetCompleteSession() { | |
| // Level 1: Reset transcription data | |
| resetTranscriptionState(); | |
| // Level 2: Reset speaker names | |
| state.speakerNames = {}; | |
| // Level 3: Reset summary and title | |
| state.summary = ''; | |
| state.title = ''; | |
| elements.summaryOutput.innerHTML = ''; | |
| elements.titleOutput.textContent = ''; | |
| // Level 4: Reset timeline segments | |
| renderTimelineSegments(); // Will be empty with no utterances | |
| // Optional: Hide detect speaker names button | |
| elements.detectSpeakerNamesBtn.classList.add('hidden'); | |
| } | |
| ``` | |
| ## Implementation | |
| ### Change 1: Create `resetCompleteSession()` Function | |
| **File:** `frontend/app.js` (after `resetTranscriptionState()`) | |
| ```javascript | |
| function resetCompleteSession() { | |
| // Reset transcription data | |
| resetTranscriptionState(); | |
| // Reset speaker names | |
| state.speakerNames = {}; | |
| // Reset summary and title | |
| state.summary = ''; | |
| state.title = ''; | |
| // Clear summary and title UI | |
| elements.summaryOutput.innerHTML = ''; | |
| elements.titleOutput.textContent = ''; | |
| // Reset timeline visualization | |
| renderTimelineSegments(); | |
| // Hide speaker name detection button | |
| elements.detectSpeakerNamesBtn.classList.add('hidden'); | |
| // Reset status | |
| setStatus('Ready for new transcription', 'info'); | |
| } | |
| ``` | |
| ### Change 2: Call Reset on File Upload | |
| **File:** `frontend/app.js:handleFileUpload()` (lines ~1119-1127) | |
| ```javascript | |
| function handleFileUpload(event) { | |
| const file = event.target.files?.[0]; | |
| if (!file) return; | |
| // Reset complete session when new file loaded | |
| resetCompleteSession(); | |
| state.uploadedFile = file; | |
| state.audioUrl = null; | |
| const objectUrl = URL.createObjectURL(file); | |
| elements.audioPlayer.src = objectUrl; | |
| setStatus(`Loaded ${file.name}`, 'info'); | |
| } | |
| ``` | |
| ### Change 3: Call Reset on YouTube Fetch | |
| **File:** `frontend/app.js:handleYoutubeFetch()` (lines ~1129-1147) | |
| ```javascript | |
| async function handleYoutubeFetch() { | |
| if (!elements.youtubeUrl.value.trim()) return; | |
| setStatus('Downloading audio from YouTube...', 'info'); | |
| try { | |
| const res = await fetch('/api/youtube/fetch', { | |
| method: 'POST', | |
| headers: { 'Content-Type': 'application/json' }, | |
| body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }), | |
| }); | |
| if (!res.ok) throw new Error('YouTube download failed'); | |
| const data = await res.json(); | |
| // Reset complete session when new YouTube audio loaded | |
| resetCompleteSession(); | |
| state.audioUrl = data.audioUrl; | |
| state.uploadedFile = null; | |
| elements.audioPlayer.src = data.audioUrl; | |
| setStatus('YouTube audio ready', 'success'); | |
| } catch (err) { | |
| console.error(err); | |
| setStatus(err.message, 'error'); | |
| } | |
| } | |
| ``` | |
| ### Change 4: Call Reset on Podcast Episode Download | |
| **File:** `frontend/app.js:downloadEpisode()` (lines ~1226-1258) | |
| ```javascript | |
| async function downloadEpisode(audioUrl, title, triggerButton = null) { | |
| setStatus('Downloading episode...', 'info'); | |
| let originalLabel = null; | |
| if (triggerButton) { | |
| originalLabel = triggerButton.innerHTML; | |
| triggerButton.disabled = true; | |
| triggerButton.classList.add('loading'); | |
| triggerButton.textContent = 'Downloadingβ¦'; | |
| } | |
| try { | |
| const res = await fetch('/api/podcast/download', { | |
| method: 'POST', | |
| headers: { 'Content-Type': 'application/json' }, | |
| body: JSON.stringify({ audioUrl, title }), | |
| }); | |
| if (!res.ok) throw new Error('Episode download failed'); | |
| const data = await res.json(); | |
| // Reset complete session when new episode loaded | |
| resetCompleteSession(); | |
| state.audioUrl = data.audioUrl; | |
| state.uploadedFile = null; | |
| elements.audioPlayer.src = data.audioUrl; | |
| setStatus('Episode ready', 'success'); | |
| // ... rest of the function | |
| } catch (err) { | |
| // ... error handling | |
| } | |
| } | |
| ``` | |
| ## Behavior After Fix | |
| ### Example Scenario | |
| **Step 1: Load and Process First Audio** | |
| ``` | |
| 1. Upload "interview.mp3" | |
| β resetCompleteSession() called | |
| β Clean slate: no utterances, speaker names, summary, title | |
| 2. Transcribe | |
| β resetTranscriptionState() called (redundant but harmless) | |
| β Transcript appears, 2 speakers detected | |
| 3. Edit speaker names | |
| β state.speakerNames = { 0: "Alice", 1: "Bob" } | |
| 4. Generate summary | |
| β state.summary = "Interview about AI..." | |
| β state.title = "AI Discussion" | |
| ``` | |
| **Step 2: Load Different Audio** | |
| ``` | |
| 1. Upload "meeting.mp3" | |
| β resetCompleteSession() called β | |
| β state.speakerNames = {} (cleared) β | |
| β state.summary = '' (cleared) β | |
| β state.title = '' (cleared) β | |
| β Summary UI cleared β | |
| β Title UI cleared β | |
| β Timeline cleared β | |
| β Status: "Loaded meeting.mp3" | |
| 2. Transcribe | |
| β Fresh transcript with 3 speakers | |
| β Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" β | |
| β No contamination from previous audio β | |
| ``` | |
| **Step 3: Generate New Summary** | |
| ``` | |
| 1. Click "Generate Summary" | |
| β New summary generated for current audio β | |
| β Replaces old summary (already cleared) β | |
| β New title generated β | |
| ``` | |
| ## Edge Cases | |
| ### Edge Case 1: Upload Same File Twice | |
| ``` | |
| 1. Upload "audio.mp3" | |
| β resetCompleteSession() called | |
| 2. Transcribe and edit | |
| 3. Upload same "audio.mp3" again | |
| β resetCompleteSession() called (data cleared) | |
| β User must transcribe again | |
| Decision: Acceptable - user explicitly chose to reload | |
| ``` | |
| ### Edge Case 2: Change Source During Transcription | |
| ``` | |
| 1. Start transcription of "audio1.mp3" | |
| 2. Mid-transcription, upload "audio2.mp3" | |
| β resetCompleteSession() called | |
| β Partial transcription cleared | |
| β New audio loaded | |
| Decision: Acceptable - user action indicates intent to switch | |
| Note: Transcription abort handling already exists | |
| ``` | |
| ### Edge Case 3: YouTube Fetch While Audio Playing | |
| ``` | |
| 1. Upload file, play audio | |
| 2. Fetch YouTube audio | |
| β resetCompleteSession() called | |
| β Audio player source changed | |
| β Playback stops (normal behavior) | |
| Decision: Acceptable - expected behavior when changing source | |
| ``` | |
| ### Edge Case 4: Multiple Podcast Episodes in Sequence | |
| ``` | |
| 1. Download episode 1 | |
| β resetCompleteSession() | |
| 2. Transcribe episode 1 | |
| 3. Download episode 2 | |
| β resetCompleteSession() (episode 1 data cleared) | |
| 4. Transcribe episode 2 | |
| Decision: Correct behavior - each episode is independent | |
| ``` | |
| ## UI Elements to Reset | |
| ### Complete Checklist | |
| **State Variables:** | |
| - [x] `state.utterances` (via resetTranscriptionState) | |
| - [x] `state.diarizedUtterances` (via resetTranscriptionState) | |
| - [x] `state.diarizationStats` (via resetTranscriptionState) | |
| - [x] `state.speakerNames` (NEW) | |
| - [x] `state.summary` (NEW) | |
| - [x] `state.title` (NEW) | |
| - [x] `activeUtteranceIndex` (via resetTranscriptionState) | |
| **DOM Elements:** | |
| - [x] `elements.transcriptList` (via resetTranscriptionState) | |
| - [x] `elements.utteranceCount` (via resetTranscriptionState) | |
| - [x] `elements.diarizationPanel` (via resetTranscriptionState) | |
| - [x] `elements.diarizationMetrics` (via renderDiarizationStats after reset) | |
| - [x] `elements.speakerBreakdown` (via renderDiarizationStats after reset) | |
| - [x] `elements.summaryOutput` (NEW) | |
| - [x] `elements.titleOutput` (NEW) | |
| - [x] `elements.timelineSegments` (via renderTimelineSegments) | |
| - [x] `elements.detectSpeakerNamesBtn` visibility (NEW) | |
| ## Testing Scenarios | |
| ### β Test 1: Upload β Edit β Upload New | |
| 1. Upload "audio1.mp3" | |
| 2. Transcribe, edit speaker names to "Alice", "Bob" | |
| 3. Generate summary "Summary 1" | |
| 4. Upload "audio2.mp3" | |
| 5. **Verify:** Speaker names cleared, summary cleared, title cleared | |
| 6. Transcribe | |
| 7. **Verify:** Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob) | |
| ### β Test 2: YouTube β Summary β Podcast | |
| 1. Fetch YouTube audio | |
| 2. Transcribe, generate summary | |
| 3. Download podcast episode | |
| 4. **Verify:** YouTube summary cleared | |
| 5. Transcribe podcast | |
| 6. **Verify:** Independent transcript and summary | |
| ### β Test 3: Podcast β Names β YouTube | |
| 1. Download podcast | |
| 2. Transcribe, detect speaker names | |
| 3. Fetch YouTube audio | |
| 4. **Verify:** Podcast speaker names cleared | |
| 5. Transcribe YouTube | |
| 6. **Verify:** No podcast names visible | |
| ### β Test 4: Rapid Source Changes | |
| 1. Upload file | |
| 2. Immediately fetch YouTube (before transcription) | |
| 3. **Verify:** File data cleared, YouTube ready | |
| 4. Immediately download podcast | |
| 5. **Verify:** YouTube data cleared, podcast ready | |
| ### β Test 5: Same Source Reload | |
| 1. Upload "audio.mp3", transcribe, edit | |
| 2. Upload same "audio.mp3" again | |
| 3. **Verify:** Previous edits cleared (fresh start) | |
| ### β Test 6: Timeline Visualization | |
| 1. Upload audio, transcribe (timeline segments appear) | |
| 2. Upload different audio | |
| 3. **Verify:** Timeline segments cleared (empty) | |
| 4. Transcribe new audio | |
| 5. **Verify:** New timeline segments appear | |
| ## Performance Considerations | |
| - **resetCompleteSession():** O(1) - fast state/DOM clearing | |
| - **Called only on source change:** Infrequent user action | |
| - **Impact:** Negligible (<1ms) | |
| ## Backward Compatibility | |
| - β Existing `resetTranscriptionState()` unchanged | |
| - β New function adds capability, doesn't break existing code | |
| - β No API changes required | |
| - β No breaking changes to user workflow | |
| ## Implementation Checklist | |
| - [ ] Create `resetCompleteSession()` function | |
| - [ ] Update `handleFileUpload()` to call reset | |
| - [ ] Update `handleYoutubeFetch()` to call reset | |
| - [ ] Update `downloadEpisode()` to call reset | |
| - [ ] Test all source change scenarios | |
| - [ ] Verify UI elements cleared | |
| - [ ] Verify no data contamination between sessions | |
| - [ ] Update documentation | |
| - [ ] Commit changes | |
| ## Related Bugs | |
| - Bug 2.4.1: Manual speaker name propagation (Fixed) | |
| - Bug 2.4.2: Auto-detection UI update (Fixed) | |
| - Bug 2.4.3: Clear name to enable detection (Fixed) | |
| - Bug 2.4.4: State persistence across audio files (This bug) | |
| ## Files to Modify | |
| ### `/home/luigi/VoxSum/frontend/app.js` | |
| - **New Function:** `resetCompleteSession()` (after line 273) | |
| - **Modify:** `handleFileUpload()` (line ~1122) | |
| - **Modify:** `handleYoutubeFetch()` (line ~1141) | |
| - **Modify:** `downloadEpisode()` (line ~1239) | |
| - **Impact:** ~40 lines added/modified | |
| ## Conclusion | |
| The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive `resetCompleteSession()` function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions. | |