State Persistence Bug - Analysis and Fix
Date
October 1, 2025
Overview
Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast).
Problem Statement (Bug 2.4.4)
User Story
As a user, when I:
- Load an audio file and transcribe it
- Edit/detect speaker names (e.g., "Alice", "Bob")
- Generate summary and title
- Load a DIFFERENT audio file (upload, YouTube, podcast)
- Expected: Clean slate - no speaker names, no summary, no title
- Actual: Previous speaker names, summary, and title still visible
Visual Example
Scenario:
Step 1: Load "podcast_interview.mp3"
- Transcribe β 2 speakers detected
- Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob"
- Generate summary: "Interview about AI..."
- Title: "AI Discussion with Alice"
Step 2: Load "meeting_recording.mp3" (different audio)
- Audio player shows new file β
- Transcript: EMPTY (not yet transcribed) β
- Speaker names: Still shows "Alice", "Bob" from previous file β
- Summary: Still shows "Interview about AI..." β
- Title: Still shows "AI Discussion with Alice" β
Step 3: Transcribe new audio
- New transcript appears with 3 speakers
- Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) β
- Summary: Still old summary β
Impact
- Confusion: Users see speaker names from different audio files
- Data Integrity: Mixed data from multiple sessions
- Trust Issue: Users can't trust the displayed information
- UX Problem: Must manually clear/reset before each new file
Root Cause Analysis
Current State Management
State Object:
const state = {
config: { moonshine: {}, sensevoice: {}, llms: {} },
backend: 'sensevoice',
utterances: [],
diarizedUtterances: null,
diarizationStats: null,
speakerNames: {}, // β NOT reset when source changes
summary: '', // β NOT reset when source changes
title: '', // β NOT reset when source changes
audioUrl: null,
sourcePath: null,
uploadedFile: null,
transcribing: false,
summarizing: false,
detectingSpeakerNames: false,
transcriptionController: null,
summaryController: null,
};
Existing Reset Function
Location: frontend/app.js:resetTranscriptionState() (lines 265-273)
function resetTranscriptionState() {
state.utterances = [];
state.diarizedUtterances = null;
state.diarizationStats = null;
activeUtteranceIndex = -1;
elements.transcriptList.innerHTML = '';
elements.utteranceCount.textContent = '';
elements.diarizationPanel.classList.add('hidden');
// β MISSING: state.speakerNames = {};
// β MISSING: state.summary = '';
// β MISSING: state.title = '';
// β MISSING: Clear summary/title UI elements
}
Called only by: handleTranscription() (line 302)
Source Change Functions
Function 1: handleFileUpload() (lines 1119-1127)
function handleFileUpload(event) {
const file = event.target.files?.[0];
if (!file) return;
state.uploadedFile = file;
state.audioUrl = null;
const objectUrl = URL.createObjectURL(file);
elements.audioPlayer.src = objectUrl;
setStatus(`Loaded ${file.name}`, 'info');
// β MISSING: No call to reset state
}
Function 2: handleYoutubeFetch() (lines 1129-1147)
async function handleYoutubeFetch() {
// ... fetch logic ...
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('YouTube audio ready', 'success');
// β MISSING: No call to reset state
}
Function 3: downloadEpisode() (lines 1226-1258)
async function downloadEpisode(audioUrl, title, triggerButton = null) {
// ... download logic ...
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('Episode ready', 'success');
// β MISSING: No call to reset state
}
Why It Happens
Problem Flow:
1. User loads Audio A
β state.speakerNames, summary, title are empty
2. User transcribes Audio A
β resetTranscriptionState() called (clears transcript, but NOT speaker names)
β Transcription creates new utterances
β state.speakerNames gets populated
3. User edits speaker names, generates summary
β state.speakerNames = { 0: "Alice", 1: "Bob" }
β state.summary = "Interview..."
β state.title = "AI Discussion"
4. User loads Audio B (via upload, YouTube, or podcast)
β handleFileUpload/handleYoutubeFetch/downloadEpisode called
β Audio player source changed β
β state.audioUrl/uploadedFile updated β
β BUT state.speakerNames, summary, title NOT cleared β
5. User transcribes Audio B
β resetTranscriptionState() called
β Clears utterances, diarization stats β
β BUT does NOT clear speakerNames, summary, title β
β New transcription with old speaker names appears!
Solution Design
Design Principles
- Complete Reset: Clear ALL session-specific data when source changes
- Clear Intent: Reset should happen immediately when new source loaded
- Separation of Concerns:
- Transcription reset: Clear transcription-related data
- Session reset: Clear ALL session data including summary, title, speaker names
- Consistent Behavior: Same reset logic for all source types (upload, YouTube, podcast)
Two-Level Reset Strategy
Level 1: Reset Transcription Data (Existing)
When: Before starting new transcription
What: Utterances, diarization stats, transcript UI
function resetTranscriptionState() {
state.utterances = [];
state.diarizedUtterances = null;
state.diarizationStats = null;
activeUtteranceIndex = -1;
elements.transcriptList.innerHTML = '';
elements.utteranceCount.textContent = '';
elements.diarizationPanel.classList.add('hidden');
}
Level 2: Reset Complete Session (NEW)
When: When new audio source is loaded
What: Everything from Level 1 + speaker names + summary + title
function resetCompleteSession() {
// Level 1: Reset transcription data
resetTranscriptionState();
// Level 2: Reset speaker names
state.speakerNames = {};
// Level 3: Reset summary and title
state.summary = '';
state.title = '';
elements.summaryOutput.innerHTML = '';
elements.titleOutput.textContent = '';
// Level 4: Reset timeline segments
renderTimelineSegments(); // Will be empty with no utterances
// Optional: Hide detect speaker names button
elements.detectSpeakerNamesBtn.classList.add('hidden');
}
Implementation
Change 1: Create resetCompleteSession() Function
File: frontend/app.js (after resetTranscriptionState())
function resetCompleteSession() {
// Reset transcription data
resetTranscriptionState();
// Reset speaker names
state.speakerNames = {};
// Reset summary and title
state.summary = '';
state.title = '';
// Clear summary and title UI
elements.summaryOutput.innerHTML = '';
elements.titleOutput.textContent = '';
// Reset timeline visualization
renderTimelineSegments();
// Hide speaker name detection button
elements.detectSpeakerNamesBtn.classList.add('hidden');
// Reset status
setStatus('Ready for new transcription', 'info');
}
Change 2: Call Reset on File Upload
File: frontend/app.js:handleFileUpload() (lines ~1119-1127)
function handleFileUpload(event) {
const file = event.target.files?.[0];
if (!file) return;
// Reset complete session when new file loaded
resetCompleteSession();
state.uploadedFile = file;
state.audioUrl = null;
const objectUrl = URL.createObjectURL(file);
elements.audioPlayer.src = objectUrl;
setStatus(`Loaded ${file.name}`, 'info');
}
Change 3: Call Reset on YouTube Fetch
File: frontend/app.js:handleYoutubeFetch() (lines ~1129-1147)
async function handleYoutubeFetch() {
if (!elements.youtubeUrl.value.trim()) return;
setStatus('Downloading audio from YouTube...', 'info');
try {
const res = await fetch('/api/youtube/fetch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }),
});
if (!res.ok) throw new Error('YouTube download failed');
const data = await res.json();
// Reset complete session when new YouTube audio loaded
resetCompleteSession();
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('YouTube audio ready', 'success');
} catch (err) {
console.error(err);
setStatus(err.message, 'error');
}
}
Change 4: Call Reset on Podcast Episode Download
File: frontend/app.js:downloadEpisode() (lines ~1226-1258)
async function downloadEpisode(audioUrl, title, triggerButton = null) {
setStatus('Downloading episode...', 'info');
let originalLabel = null;
if (triggerButton) {
originalLabel = triggerButton.innerHTML;
triggerButton.disabled = true;
triggerButton.classList.add('loading');
triggerButton.textContent = 'Downloadingβ¦';
}
try {
const res = await fetch('/api/podcast/download', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ audioUrl, title }),
});
if (!res.ok) throw new Error('Episode download failed');
const data = await res.json();
// Reset complete session when new episode loaded
resetCompleteSession();
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('Episode ready', 'success');
// ... rest of the function
} catch (err) {
// ... error handling
}
}
Behavior After Fix
Example Scenario
Step 1: Load and Process First Audio
1. Upload "interview.mp3"
β resetCompleteSession() called
β Clean slate: no utterances, speaker names, summary, title
2. Transcribe
β resetTranscriptionState() called (redundant but harmless)
β Transcript appears, 2 speakers detected
3. Edit speaker names
β state.speakerNames = { 0: "Alice", 1: "Bob" }
4. Generate summary
β state.summary = "Interview about AI..."
β state.title = "AI Discussion"
Step 2: Load Different Audio
1. Upload "meeting.mp3"
β resetCompleteSession() called β
β state.speakerNames = {} (cleared) β
β state.summary = '' (cleared) β
β state.title = '' (cleared) β
β Summary UI cleared β
β Title UI cleared β
β Timeline cleared β
β Status: "Loaded meeting.mp3"
2. Transcribe
β Fresh transcript with 3 speakers
β Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" β
β No contamination from previous audio β
Step 3: Generate New Summary
1. Click "Generate Summary"
β New summary generated for current audio β
β Replaces old summary (already cleared) β
β New title generated β
Edge Cases
Edge Case 1: Upload Same File Twice
1. Upload "audio.mp3"
β resetCompleteSession() called
2. Transcribe and edit
3. Upload same "audio.mp3" again
β resetCompleteSession() called (data cleared)
β User must transcribe again
Decision: Acceptable - user explicitly chose to reload
Edge Case 2: Change Source During Transcription
1. Start transcription of "audio1.mp3"
2. Mid-transcription, upload "audio2.mp3"
β resetCompleteSession() called
β Partial transcription cleared
β New audio loaded
Decision: Acceptable - user action indicates intent to switch
Note: Transcription abort handling already exists
Edge Case 3: YouTube Fetch While Audio Playing
1. Upload file, play audio
2. Fetch YouTube audio
β resetCompleteSession() called
β Audio player source changed
β Playback stops (normal behavior)
Decision: Acceptable - expected behavior when changing source
Edge Case 4: Multiple Podcast Episodes in Sequence
1. Download episode 1
β resetCompleteSession()
2. Transcribe episode 1
3. Download episode 2
β resetCompleteSession() (episode 1 data cleared)
4. Transcribe episode 2
Decision: Correct behavior - each episode is independent
UI Elements to Reset
Complete Checklist
State Variables:
-
state.utterances(via resetTranscriptionState) -
state.diarizedUtterances(via resetTranscriptionState) -
state.diarizationStats(via resetTranscriptionState) -
state.speakerNames(NEW) -
state.summary(NEW) -
state.title(NEW) -
activeUtteranceIndex(via resetTranscriptionState)
DOM Elements:
-
elements.transcriptList(via resetTranscriptionState) -
elements.utteranceCount(via resetTranscriptionState) -
elements.diarizationPanel(via resetTranscriptionState) -
elements.diarizationMetrics(via renderDiarizationStats after reset) -
elements.speakerBreakdown(via renderDiarizationStats after reset) -
elements.summaryOutput(NEW) -
elements.titleOutput(NEW) -
elements.timelineSegments(via renderTimelineSegments) -
elements.detectSpeakerNamesBtnvisibility (NEW)
Testing Scenarios
β Test 1: Upload β Edit β Upload New
- Upload "audio1.mp3"
- Transcribe, edit speaker names to "Alice", "Bob"
- Generate summary "Summary 1"
- Upload "audio2.mp3"
- Verify: Speaker names cleared, summary cleared, title cleared
- Transcribe
- Verify: Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob)
β Test 2: YouTube β Summary β Podcast
- Fetch YouTube audio
- Transcribe, generate summary
- Download podcast episode
- Verify: YouTube summary cleared
- Transcribe podcast
- Verify: Independent transcript and summary
β Test 3: Podcast β Names β YouTube
- Download podcast
- Transcribe, detect speaker names
- Fetch YouTube audio
- Verify: Podcast speaker names cleared
- Transcribe YouTube
- Verify: No podcast names visible
β Test 4: Rapid Source Changes
- Upload file
- Immediately fetch YouTube (before transcription)
- Verify: File data cleared, YouTube ready
- Immediately download podcast
- Verify: YouTube data cleared, podcast ready
β Test 5: Same Source Reload
- Upload "audio.mp3", transcribe, edit
- Upload same "audio.mp3" again
- Verify: Previous edits cleared (fresh start)
β Test 6: Timeline Visualization
- Upload audio, transcribe (timeline segments appear)
- Upload different audio
- Verify: Timeline segments cleared (empty)
- Transcribe new audio
- Verify: New timeline segments appear
Performance Considerations
- resetCompleteSession(): O(1) - fast state/DOM clearing
- Called only on source change: Infrequent user action
- Impact: Negligible (<1ms)
Backward Compatibility
- β
Existing
resetTranscriptionState()unchanged - β New function adds capability, doesn't break existing code
- β No API changes required
- β No breaking changes to user workflow
Implementation Checklist
- Create
resetCompleteSession()function - Update
handleFileUpload()to call reset - Update
handleYoutubeFetch()to call reset - Update
downloadEpisode()to call reset - Test all source change scenarios
- Verify UI elements cleared
- Verify no data contamination between sessions
- Update documentation
- Commit changes
Related Bugs
- Bug 2.4.1: Manual speaker name propagation (Fixed)
- Bug 2.4.2: Auto-detection UI update (Fixed)
- Bug 2.4.3: Clear name to enable detection (Fixed)
- Bug 2.4.4: State persistence across audio files (This bug)
Files to Modify
/home/luigi/VoxSum/frontend/app.js
- New Function:
resetCompleteSession()(after line 273) - Modify:
handleFileUpload()(line ~1122) - Modify:
handleYoutubeFetch()(line ~1141) - Modify:
downloadEpisode()(line ~1239) - Impact: ~40 lines added/modified
Conclusion
The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive resetCompleteSession() function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.