VoxSum / STATE_PERSISTENCE_BUG_FIX.md
Luigi's picture
fix: reset session data when loading new audio source
0bfe5ff
# State Persistence Bug - Analysis and Fix
## Date
October 1, 2025
## Overview
Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast).
## Problem Statement (Bug 2.4.4)
### User Story
**As a user**, when I:
1. Load an audio file and transcribe it
2. Edit/detect speaker names (e.g., "Alice", "Bob")
3. Generate summary and title
4. Load a DIFFERENT audio file (upload, YouTube, podcast)
5. **Expected:** Clean slate - no speaker names, no summary, no title
6. **Actual:** Previous speaker names, summary, and title still visible
### Visual Example
**Scenario:**
```
Step 1: Load "podcast_interview.mp3"
- Transcribe β†’ 2 speakers detected
- Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob"
- Generate summary: "Interview about AI..."
- Title: "AI Discussion with Alice"
Step 2: Load "meeting_recording.mp3" (different audio)
- Audio player shows new file βœ“
- Transcript: EMPTY (not yet transcribed) βœ“
- Speaker names: Still shows "Alice", "Bob" from previous file βœ—
- Summary: Still shows "Interview about AI..." βœ—
- Title: Still shows "AI Discussion with Alice" βœ—
Step 3: Transcribe new audio
- New transcript appears with 3 speakers
- Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) βœ—
- Summary: Still old summary βœ—
```
### Impact
- **Confusion:** Users see speaker names from different audio files
- **Data Integrity:** Mixed data from multiple sessions
- **Trust Issue:** Users can't trust the displayed information
- **UX Problem:** Must manually clear/reset before each new file
## Root Cause Analysis
### Current State Management
**State Object:**
```javascript
const state = {
config: { moonshine: {}, sensevoice: {}, llms: {} },
backend: 'sensevoice',
utterances: [],
diarizedUtterances: null,
diarizationStats: null,
speakerNames: {}, // ❌ NOT reset when source changes
summary: '', // ❌ NOT reset when source changes
title: '', // ❌ NOT reset when source changes
audioUrl: null,
sourcePath: null,
uploadedFile: null,
transcribing: false,
summarizing: false,
detectingSpeakerNames: false,
transcriptionController: null,
summaryController: null,
};
```
### Existing Reset Function
**Location:** `frontend/app.js:resetTranscriptionState()` (lines 265-273)
```javascript
function resetTranscriptionState() {
state.utterances = [];
state.diarizedUtterances = null;
state.diarizationStats = null;
activeUtteranceIndex = -1;
elements.transcriptList.innerHTML = '';
elements.utteranceCount.textContent = '';
elements.diarizationPanel.classList.add('hidden');
// ❌ MISSING: state.speakerNames = {};
// ❌ MISSING: state.summary = '';
// ❌ MISSING: state.title = '';
// ❌ MISSING: Clear summary/title UI elements
}
```
**Called only by:** `handleTranscription()` (line 302)
### Source Change Functions
#### Function 1: `handleFileUpload()` (lines 1119-1127)
```javascript
function handleFileUpload(event) {
const file = event.target.files?.[0];
if (!file) return;
state.uploadedFile = file;
state.audioUrl = null;
const objectUrl = URL.createObjectURL(file);
elements.audioPlayer.src = objectUrl;
setStatus(`Loaded ${file.name}`, 'info');
// ❌ MISSING: No call to reset state
}
```
#### Function 2: `handleYoutubeFetch()` (lines 1129-1147)
```javascript
async function handleYoutubeFetch() {
// ... fetch logic ...
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('YouTube audio ready', 'success');
// ❌ MISSING: No call to reset state
}
```
#### Function 3: `downloadEpisode()` (lines 1226-1258)
```javascript
async function downloadEpisode(audioUrl, title, triggerButton = null) {
// ... download logic ...
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('Episode ready', 'success');
// ❌ MISSING: No call to reset state
}
```
### Why It Happens
**Problem Flow:**
```
1. User loads Audio A
β†’ state.speakerNames, summary, title are empty
2. User transcribes Audio A
β†’ resetTranscriptionState() called (clears transcript, but NOT speaker names)
β†’ Transcription creates new utterances
β†’ state.speakerNames gets populated
3. User edits speaker names, generates summary
β†’ state.speakerNames = { 0: "Alice", 1: "Bob" }
β†’ state.summary = "Interview..."
β†’ state.title = "AI Discussion"
4. User loads Audio B (via upload, YouTube, or podcast)
β†’ handleFileUpload/handleYoutubeFetch/downloadEpisode called
β†’ Audio player source changed βœ“
β†’ state.audioUrl/uploadedFile updated βœ“
β†’ BUT state.speakerNames, summary, title NOT cleared βœ—
5. User transcribes Audio B
β†’ resetTranscriptionState() called
β†’ Clears utterances, diarization stats βœ“
β†’ BUT does NOT clear speakerNames, summary, title βœ—
β†’ New transcription with old speaker names appears!
```
## Solution Design
### Design Principles
1. **Complete Reset:** Clear ALL session-specific data when source changes
2. **Clear Intent:** Reset should happen immediately when new source loaded
3. **Separation of Concerns:**
- Transcription reset: Clear transcription-related data
- Session reset: Clear ALL session data including summary, title, speaker names
4. **Consistent Behavior:** Same reset logic for all source types (upload, YouTube, podcast)
### Two-Level Reset Strategy
#### Level 1: Reset Transcription Data (Existing)
**When:** Before starting new transcription
**What:** Utterances, diarization stats, transcript UI
```javascript
function resetTranscriptionState() {
state.utterances = [];
state.diarizedUtterances = null;
state.diarizationStats = null;
activeUtteranceIndex = -1;
elements.transcriptList.innerHTML = '';
elements.utteranceCount.textContent = '';
elements.diarizationPanel.classList.add('hidden');
}
```
#### Level 2: Reset Complete Session (NEW)
**When:** When new audio source is loaded
**What:** Everything from Level 1 + speaker names + summary + title
```javascript
function resetCompleteSession() {
// Level 1: Reset transcription data
resetTranscriptionState();
// Level 2: Reset speaker names
state.speakerNames = {};
// Level 3: Reset summary and title
state.summary = '';
state.title = '';
elements.summaryOutput.innerHTML = '';
elements.titleOutput.textContent = '';
// Level 4: Reset timeline segments
renderTimelineSegments(); // Will be empty with no utterances
// Optional: Hide detect speaker names button
elements.detectSpeakerNamesBtn.classList.add('hidden');
}
```
## Implementation
### Change 1: Create `resetCompleteSession()` Function
**File:** `frontend/app.js` (after `resetTranscriptionState()`)
```javascript
function resetCompleteSession() {
// Reset transcription data
resetTranscriptionState();
// Reset speaker names
state.speakerNames = {};
// Reset summary and title
state.summary = '';
state.title = '';
// Clear summary and title UI
elements.summaryOutput.innerHTML = '';
elements.titleOutput.textContent = '';
// Reset timeline visualization
renderTimelineSegments();
// Hide speaker name detection button
elements.detectSpeakerNamesBtn.classList.add('hidden');
// Reset status
setStatus('Ready for new transcription', 'info');
}
```
### Change 2: Call Reset on File Upload
**File:** `frontend/app.js:handleFileUpload()` (lines ~1119-1127)
```javascript
function handleFileUpload(event) {
const file = event.target.files?.[0];
if (!file) return;
// Reset complete session when new file loaded
resetCompleteSession();
state.uploadedFile = file;
state.audioUrl = null;
const objectUrl = URL.createObjectURL(file);
elements.audioPlayer.src = objectUrl;
setStatus(`Loaded ${file.name}`, 'info');
}
```
### Change 3: Call Reset on YouTube Fetch
**File:** `frontend/app.js:handleYoutubeFetch()` (lines ~1129-1147)
```javascript
async function handleYoutubeFetch() {
if (!elements.youtubeUrl.value.trim()) return;
setStatus('Downloading audio from YouTube...', 'info');
try {
const res = await fetch('/api/youtube/fetch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }),
});
if (!res.ok) throw new Error('YouTube download failed');
const data = await res.json();
// Reset complete session when new YouTube audio loaded
resetCompleteSession();
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('YouTube audio ready', 'success');
} catch (err) {
console.error(err);
setStatus(err.message, 'error');
}
}
```
### Change 4: Call Reset on Podcast Episode Download
**File:** `frontend/app.js:downloadEpisode()` (lines ~1226-1258)
```javascript
async function downloadEpisode(audioUrl, title, triggerButton = null) {
setStatus('Downloading episode...', 'info');
let originalLabel = null;
if (triggerButton) {
originalLabel = triggerButton.innerHTML;
triggerButton.disabled = true;
triggerButton.classList.add('loading');
triggerButton.textContent = 'Downloading…';
}
try {
const res = await fetch('/api/podcast/download', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ audioUrl, title }),
});
if (!res.ok) throw new Error('Episode download failed');
const data = await res.json();
// Reset complete session when new episode loaded
resetCompleteSession();
state.audioUrl = data.audioUrl;
state.uploadedFile = null;
elements.audioPlayer.src = data.audioUrl;
setStatus('Episode ready', 'success');
// ... rest of the function
} catch (err) {
// ... error handling
}
}
```
## Behavior After Fix
### Example Scenario
**Step 1: Load and Process First Audio**
```
1. Upload "interview.mp3"
β†’ resetCompleteSession() called
β†’ Clean slate: no utterances, speaker names, summary, title
2. Transcribe
β†’ resetTranscriptionState() called (redundant but harmless)
β†’ Transcript appears, 2 speakers detected
3. Edit speaker names
β†’ state.speakerNames = { 0: "Alice", 1: "Bob" }
4. Generate summary
β†’ state.summary = "Interview about AI..."
β†’ state.title = "AI Discussion"
```
**Step 2: Load Different Audio**
```
1. Upload "meeting.mp3"
β†’ resetCompleteSession() called βœ“
β†’ state.speakerNames = {} (cleared) βœ“
β†’ state.summary = '' (cleared) βœ“
β†’ state.title = '' (cleared) βœ“
β†’ Summary UI cleared βœ“
β†’ Title UI cleared βœ“
β†’ Timeline cleared βœ“
β†’ Status: "Loaded meeting.mp3"
2. Transcribe
β†’ Fresh transcript with 3 speakers
β†’ Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" βœ“
β†’ No contamination from previous audio βœ“
```
**Step 3: Generate New Summary**
```
1. Click "Generate Summary"
β†’ New summary generated for current audio βœ“
β†’ Replaces old summary (already cleared) βœ“
β†’ New title generated βœ“
```
## Edge Cases
### Edge Case 1: Upload Same File Twice
```
1. Upload "audio.mp3"
β†’ resetCompleteSession() called
2. Transcribe and edit
3. Upload same "audio.mp3" again
β†’ resetCompleteSession() called (data cleared)
β†’ User must transcribe again
Decision: Acceptable - user explicitly chose to reload
```
### Edge Case 2: Change Source During Transcription
```
1. Start transcription of "audio1.mp3"
2. Mid-transcription, upload "audio2.mp3"
β†’ resetCompleteSession() called
β†’ Partial transcription cleared
β†’ New audio loaded
Decision: Acceptable - user action indicates intent to switch
Note: Transcription abort handling already exists
```
### Edge Case 3: YouTube Fetch While Audio Playing
```
1. Upload file, play audio
2. Fetch YouTube audio
β†’ resetCompleteSession() called
β†’ Audio player source changed
β†’ Playback stops (normal behavior)
Decision: Acceptable - expected behavior when changing source
```
### Edge Case 4: Multiple Podcast Episodes in Sequence
```
1. Download episode 1
β†’ resetCompleteSession()
2. Transcribe episode 1
3. Download episode 2
β†’ resetCompleteSession() (episode 1 data cleared)
4. Transcribe episode 2
Decision: Correct behavior - each episode is independent
```
## UI Elements to Reset
### Complete Checklist
**State Variables:**
- [x] `state.utterances` (via resetTranscriptionState)
- [x] `state.diarizedUtterances` (via resetTranscriptionState)
- [x] `state.diarizationStats` (via resetTranscriptionState)
- [x] `state.speakerNames` (NEW)
- [x] `state.summary` (NEW)
- [x] `state.title` (NEW)
- [x] `activeUtteranceIndex` (via resetTranscriptionState)
**DOM Elements:**
- [x] `elements.transcriptList` (via resetTranscriptionState)
- [x] `elements.utteranceCount` (via resetTranscriptionState)
- [x] `elements.diarizationPanel` (via resetTranscriptionState)
- [x] `elements.diarizationMetrics` (via renderDiarizationStats after reset)
- [x] `elements.speakerBreakdown` (via renderDiarizationStats after reset)
- [x] `elements.summaryOutput` (NEW)
- [x] `elements.titleOutput` (NEW)
- [x] `elements.timelineSegments` (via renderTimelineSegments)
- [x] `elements.detectSpeakerNamesBtn` visibility (NEW)
## Testing Scenarios
### βœ… Test 1: Upload β†’ Edit β†’ Upload New
1. Upload "audio1.mp3"
2. Transcribe, edit speaker names to "Alice", "Bob"
3. Generate summary "Summary 1"
4. Upload "audio2.mp3"
5. **Verify:** Speaker names cleared, summary cleared, title cleared
6. Transcribe
7. **Verify:** Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob)
### βœ… Test 2: YouTube β†’ Summary β†’ Podcast
1. Fetch YouTube audio
2. Transcribe, generate summary
3. Download podcast episode
4. **Verify:** YouTube summary cleared
5. Transcribe podcast
6. **Verify:** Independent transcript and summary
### βœ… Test 3: Podcast β†’ Names β†’ YouTube
1. Download podcast
2. Transcribe, detect speaker names
3. Fetch YouTube audio
4. **Verify:** Podcast speaker names cleared
5. Transcribe YouTube
6. **Verify:** No podcast names visible
### βœ… Test 4: Rapid Source Changes
1. Upload file
2. Immediately fetch YouTube (before transcription)
3. **Verify:** File data cleared, YouTube ready
4. Immediately download podcast
5. **Verify:** YouTube data cleared, podcast ready
### βœ… Test 5: Same Source Reload
1. Upload "audio.mp3", transcribe, edit
2. Upload same "audio.mp3" again
3. **Verify:** Previous edits cleared (fresh start)
### βœ… Test 6: Timeline Visualization
1. Upload audio, transcribe (timeline segments appear)
2. Upload different audio
3. **Verify:** Timeline segments cleared (empty)
4. Transcribe new audio
5. **Verify:** New timeline segments appear
## Performance Considerations
- **resetCompleteSession():** O(1) - fast state/DOM clearing
- **Called only on source change:** Infrequent user action
- **Impact:** Negligible (<1ms)
## Backward Compatibility
- βœ… Existing `resetTranscriptionState()` unchanged
- βœ… New function adds capability, doesn't break existing code
- βœ… No API changes required
- βœ… No breaking changes to user workflow
## Implementation Checklist
- [ ] Create `resetCompleteSession()` function
- [ ] Update `handleFileUpload()` to call reset
- [ ] Update `handleYoutubeFetch()` to call reset
- [ ] Update `downloadEpisode()` to call reset
- [ ] Test all source change scenarios
- [ ] Verify UI elements cleared
- [ ] Verify no data contamination between sessions
- [ ] Update documentation
- [ ] Commit changes
## Related Bugs
- Bug 2.4.1: Manual speaker name propagation (Fixed)
- Bug 2.4.2: Auto-detection UI update (Fixed)
- Bug 2.4.3: Clear name to enable detection (Fixed)
- Bug 2.4.4: State persistence across audio files (This bug)
## Files to Modify
### `/home/luigi/VoxSum/frontend/app.js`
- **New Function:** `resetCompleteSession()` (after line 273)
- **Modify:** `handleFileUpload()` (line ~1122)
- **Modify:** `handleYoutubeFetch()` (line ~1141)
- **Modify:** `downloadEpisode()` (line ~1239)
- **Impact:** ~40 lines added/modified
## Conclusion
The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive `resetCompleteSession()` function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.