Spaces:

Luigi
/

VoxSum

Running

App Files Files Community

VoxSum / STATE_PERSISTENCE_BUG_FIX.md

Luigi

fix: reset session data when loading new audio source

0bfe5ff about 2 months ago

preview code

raw

history blame contribute delete

16.7 kB

	# State Persistence Bug - Analysis and Fix

	## Date
	October 1, 2025

	## Overview
	Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast).

	## Problem Statement (Bug 2.4.4)

	### User Story
	As a user, when I:
	1. Load an audio file and transcribe it
	2. Edit/detect speaker names (e.g., "Alice", "Bob")
	3. Generate summary and title
	4. Load a DIFFERENT audio file (upload, YouTube, podcast)
	5. Expected: Clean slate - no speaker names, no summary, no title
	6. Actual: Previous speaker names, summary, and title still visible

	### Visual Example

	Scenario:
	```
	Step 1: Load "podcast_interview.mp3"
	- Transcribe → 2 speakers detected
	- Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob"
	- Generate summary: "Interview about AI..."
	- Title: "AI Discussion with Alice"

	Step 2: Load "meeting_recording.mp3" (different audio)
	- Audio player shows new file ✓
	- Transcript: EMPTY (not yet transcribed) ✓
	- Speaker names: Still shows "Alice", "Bob" from previous file ✗
	- Summary: Still shows "Interview about AI..." ✗
	- Title: Still shows "AI Discussion with Alice" ✗

	Step 3: Transcribe new audio
	- New transcript appears with 3 speakers
	- Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) ✗
	- Summary: Still old summary ✗
	```

	### Impact
	- Confusion: Users see speaker names from different audio files
	- Data Integrity: Mixed data from multiple sessions
	- Trust Issue: Users can't trust the displayed information
	- UX Problem: Must manually clear/reset before each new file

	## Root Cause Analysis

	### Current State Management

	State Object:
	```javascript
	const state = {
	config: { moonshine: {}, sensevoice: {}, llms: {} },
	backend: 'sensevoice',
	utterances: [],
	diarizedUtterances: null,
	diarizationStats: null,
	speakerNames: {}, // ❌ NOT reset when source changes
	summary: '', // ❌ NOT reset when source changes
	title: '', // ❌ NOT reset when source changes
	audioUrl: null,
	sourcePath: null,
	uploadedFile: null,
	transcribing: false,
	summarizing: false,
	detectingSpeakerNames: false,
	transcriptionController: null,
	summaryController: null,
	};
	```

	### Existing Reset Function

	Location: `frontend/app.js:resetTranscriptionState()` (lines 265-273)

	```javascript
	function resetTranscriptionState() {
	state.utterances = [];
	state.diarizedUtterances = null;
	state.diarizationStats = null;
	activeUtteranceIndex = -1;
	elements.transcriptList.innerHTML = '';
	elements.utteranceCount.textContent = '';
	elements.diarizationPanel.classList.add('hidden');
	// ❌ MISSING: state.speakerNames = {};
	// ❌ MISSING: state.summary = '';
	// ❌ MISSING: state.title = '';
	// ❌ MISSING: Clear summary/title UI elements
	}
	```

	Called only by: `handleTranscription()` (line 302)

	### Source Change Functions

	#### Function 1: `handleFileUpload()` (lines 1119-1127)
	```javascript
	function handleFileUpload(event) {
	const file = event.target.files?.[0];
	if (!file) return;
	state.uploadedFile = file;
	state.audioUrl = null;
	const objectUrl = URL.createObjectURL(file);
	elements.audioPlayer.src = objectUrl;
	setStatus(`Loaded ${file.name}`, 'info');
	// ❌ MISSING: No call to reset state
	}
	```

	#### Function 2: `handleYoutubeFetch()` (lines 1129-1147)
	```javascript
	async function handleYoutubeFetch() {
	// ... fetch logic ...
	state.audioUrl = data.audioUrl;
	state.uploadedFile = null;
	elements.audioPlayer.src = data.audioUrl;
	setStatus('YouTube audio ready', 'success');
	// ❌ MISSING: No call to reset state
	}
	```

	#### Function 3: `downloadEpisode()` (lines 1226-1258)
	```javascript
	async function downloadEpisode(audioUrl, title, triggerButton = null) {
	// ... download logic ...
	state.audioUrl = data.audioUrl;
	state.uploadedFile = null;
	elements.audioPlayer.src = data.audioUrl;
	setStatus('Episode ready', 'success');
	// ❌ MISSING: No call to reset state
	}
	```

	### Why It Happens

	Problem Flow:
	```
	1. User loads Audio A
	→ state.speakerNames, summary, title are empty

	2. User transcribes Audio A
	→ resetTranscriptionState() called (clears transcript, but NOT speaker names)
	→ Transcription creates new utterances
	→ state.speakerNames gets populated

	3. User edits speaker names, generates summary
	→ state.speakerNames = { 0: "Alice", 1: "Bob" }
	→ state.summary = "Interview..."
	→ state.title = "AI Discussion"

	4. User loads Audio B (via upload, YouTube, or podcast)
	→ handleFileUpload/handleYoutubeFetch/downloadEpisode called
	→ Audio player source changed ✓
	→ state.audioUrl/uploadedFile updated ✓
	→ BUT state.speakerNames, summary, title NOT cleared ✗

	5. User transcribes Audio B
	→ resetTranscriptionState() called
	→ Clears utterances, diarization stats ✓
	→ BUT does NOT clear speakerNames, summary, title ✗
	→ New transcription with old speaker names appears!
	```

	## Solution Design

	### Design Principles
	1. Complete Reset: Clear ALL session-specific data when source changes
	2. Clear Intent: Reset should happen immediately when new source loaded
	3. Separation of Concerns:
	- Transcription reset: Clear transcription-related data
	- Session reset: Clear ALL session data including summary, title, speaker names
	4. Consistent Behavior: Same reset logic for all source types (upload, YouTube, podcast)

	### Two-Level Reset Strategy

	#### Level 1: Reset Transcription Data (Existing)
	When: Before starting new transcription
	What: Utterances, diarization stats, transcript UI

	```javascript
	function resetTranscriptionState() {
	state.utterances = [];
	state.diarizedUtterances = null;
	state.diarizationStats = null;
	activeUtteranceIndex = -1;
	elements.transcriptList.innerHTML = '';
	elements.utteranceCount.textContent = '';
	elements.diarizationPanel.classList.add('hidden');
	}
	```

	#### Level 2: Reset Complete Session (NEW)
	When: When new audio source is loaded
	What: Everything from Level 1 + speaker names + summary + title

	```javascript
	function resetCompleteSession() {
	// Level 1: Reset transcription data
	resetTranscriptionState();

	// Level 2: Reset speaker names
	state.speakerNames = {};

	// Level 3: Reset summary and title
	state.summary = '';
	state.title = '';
	elements.summaryOutput.innerHTML = '';
	elements.titleOutput.textContent = '';

	// Level 4: Reset timeline segments
	renderTimelineSegments(); // Will be empty with no utterances

	// Optional: Hide detect speaker names button
	elements.detectSpeakerNamesBtn.classList.add('hidden');
	}
	```

	## Implementation

	### Change 1: Create `resetCompleteSession()` Function

	File: `frontend/app.js` (after `resetTranscriptionState()`)

	```javascript
	function resetCompleteSession() {
	// Reset transcription data
	resetTranscriptionState();

	// Reset speaker names
	state.speakerNames = {};

	// Reset summary and title
	state.summary = '';
	state.title = '';

	// Clear summary and title UI
	elements.summaryOutput.innerHTML = '';
	elements.titleOutput.textContent = '';

	// Reset timeline visualization
	renderTimelineSegments();

	// Hide speaker name detection button
	elements.detectSpeakerNamesBtn.classList.add('hidden');

	// Reset status
	setStatus('Ready for new transcription', 'info');
	}
	```

	### Change 2: Call Reset on File Upload

	File: `frontend/app.js:handleFileUpload()` (lines ~1119-1127)

	```javascript
	function handleFileUpload(event) {
	const file = event.target.files?.[0];
	if (!file) return;

	// Reset complete session when new file loaded
	resetCompleteSession();

	state.uploadedFile = file;
	state.audioUrl = null;
	const objectUrl = URL.createObjectURL(file);
	elements.audioPlayer.src = objectUrl;
	setStatus(`Loaded ${file.name}`, 'info');
	}
	```

	### Change 3: Call Reset on YouTube Fetch

	File: `frontend/app.js:handleYoutubeFetch()` (lines ~1129-1147)

	```javascript
	async function handleYoutubeFetch() {
	if (!elements.youtubeUrl.value.trim()) return;
	setStatus('Downloading audio from YouTube...', 'info');
	try {
	const res = await fetch('/api/youtube/fetch', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }),
	});
	if (!res.ok) throw new Error('YouTube download failed');
	const data = await res.json();

	// Reset complete session when new YouTube audio loaded
	resetCompleteSession();

	state.audioUrl = data.audioUrl;
	state.uploadedFile = null;
	elements.audioPlayer.src = data.audioUrl;
	setStatus('YouTube audio ready', 'success');
	} catch (err) {
	console.error(err);
	setStatus(err.message, 'error');
	}
	}
	```

	### Change 4: Call Reset on Podcast Episode Download

	File: `frontend/app.js:downloadEpisode()` (lines ~1226-1258)

	```javascript
	async function downloadEpisode(audioUrl, title, triggerButton = null) {
	setStatus('Downloading episode...', 'info');
	let originalLabel = null;
	if (triggerButton) {
	originalLabel = triggerButton.innerHTML;
	triggerButton.disabled = true;
	triggerButton.classList.add('loading');
	triggerButton.textContent = 'Downloading…';
	}
	try {
	const res = await fetch('/api/podcast/download', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({ audioUrl, title }),
	});
	if (!res.ok) throw new Error('Episode download failed');
	const data = await res.json();

	// Reset complete session when new episode loaded
	resetCompleteSession();

	state.audioUrl = data.audioUrl;
	state.uploadedFile = null;
	elements.audioPlayer.src = data.audioUrl;
	setStatus('Episode ready', 'success');
	// ... rest of the function
	} catch (err) {
	// ... error handling
	}
	}
	```

	## Behavior After Fix

	### Example Scenario

	Step 1: Load and Process First Audio
	```
	1. Upload "interview.mp3"
	→ resetCompleteSession() called
	→ Clean slate: no utterances, speaker names, summary, title

	2. Transcribe
	→ resetTranscriptionState() called (redundant but harmless)
	→ Transcript appears, 2 speakers detected

	3. Edit speaker names
	→ state.speakerNames = { 0: "Alice", 1: "Bob" }

	4. Generate summary
	→ state.summary = "Interview about AI..."
	→ state.title = "AI Discussion"
	```

	Step 2: Load Different Audio
	```
	1. Upload "meeting.mp3"
	→ resetCompleteSession() called ✓
	→ state.speakerNames = {} (cleared) ✓
	→ state.summary = '' (cleared) ✓
	→ state.title = '' (cleared) ✓
	→ Summary UI cleared ✓
	→ Title UI cleared ✓
	→ Timeline cleared ✓
	→ Status: "Loaded meeting.mp3"

	2. Transcribe
	→ Fresh transcript with 3 speakers
	→ Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" ✓
	→ No contamination from previous audio ✓
	```

	Step 3: Generate New Summary
	```
	1. Click "Generate Summary"
	→ New summary generated for current audio ✓
	→ Replaces old summary (already cleared) ✓
	→ New title generated ✓
	```

	## Edge Cases

	### Edge Case 1: Upload Same File Twice
	```
	1. Upload "audio.mp3"
	→ resetCompleteSession() called
	2. Transcribe and edit
	3. Upload same "audio.mp3" again
	→ resetCompleteSession() called (data cleared)
	→ User must transcribe again

	Decision: Acceptable - user explicitly chose to reload
	```

	### Edge Case 2: Change Source During Transcription
	```
	1. Start transcription of "audio1.mp3"
	2. Mid-transcription, upload "audio2.mp3"
	→ resetCompleteSession() called
	→ Partial transcription cleared
	→ New audio loaded

	Decision: Acceptable - user action indicates intent to switch
	Note: Transcription abort handling already exists
	```

	### Edge Case 3: YouTube Fetch While Audio Playing
	```
	1. Upload file, play audio
	2. Fetch YouTube audio
	→ resetCompleteSession() called
	→ Audio player source changed
	→ Playback stops (normal behavior)

	Decision: Acceptable - expected behavior when changing source
	```

	### Edge Case 4: Multiple Podcast Episodes in Sequence
	```
	1. Download episode 1
	→ resetCompleteSession()
	2. Transcribe episode 1
	3. Download episode 2
	→ resetCompleteSession() (episode 1 data cleared)
	4. Transcribe episode 2

	Decision: Correct behavior - each episode is independent
	```

	## UI Elements to Reset

	### Complete Checklist

	State Variables:
	- [x] `state.utterances` (via resetTranscriptionState)
	- [x] `state.diarizedUtterances` (via resetTranscriptionState)
	- [x] `state.diarizationStats` (via resetTranscriptionState)
	- [x] `state.speakerNames` (NEW)
	- [x] `state.summary` (NEW)
	- [x] `state.title` (NEW)
	- [x] `activeUtteranceIndex` (via resetTranscriptionState)

	DOM Elements:
	- [x] `elements.transcriptList` (via resetTranscriptionState)
	- [x] `elements.utteranceCount` (via resetTranscriptionState)
	- [x] `elements.diarizationPanel` (via resetTranscriptionState)
	- [x] `elements.diarizationMetrics` (via renderDiarizationStats after reset)
	- [x] `elements.speakerBreakdown` (via renderDiarizationStats after reset)
	- [x] `elements.summaryOutput` (NEW)
	- [x] `elements.titleOutput` (NEW)
	- [x] `elements.timelineSegments` (via renderTimelineSegments)
	- [x] `elements.detectSpeakerNamesBtn` visibility (NEW)

	## Testing Scenarios

	### ✅ Test 1: Upload → Edit → Upload New
	1. Upload "audio1.mp3"
	2. Transcribe, edit speaker names to "Alice", "Bob"
	3. Generate summary "Summary 1"
	4. Upload "audio2.mp3"
	5. Verify: Speaker names cleared, summary cleared, title cleared
	6. Transcribe
	7. Verify: Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob)

	### ✅ Test 2: YouTube → Summary → Podcast
	1. Fetch YouTube audio
	2. Transcribe, generate summary
	3. Download podcast episode
	4. Verify: YouTube summary cleared
	5. Transcribe podcast
	6. Verify: Independent transcript and summary

	### ✅ Test 3: Podcast → Names → YouTube
	1. Download podcast
	2. Transcribe, detect speaker names
	3. Fetch YouTube audio
	4. Verify: Podcast speaker names cleared
	5. Transcribe YouTube
	6. Verify: No podcast names visible

	### ✅ Test 4: Rapid Source Changes
	1. Upload file
	2. Immediately fetch YouTube (before transcription)
	3. Verify: File data cleared, YouTube ready
	4. Immediately download podcast
	5. Verify: YouTube data cleared, podcast ready

	### ✅ Test 5: Same Source Reload
	1. Upload "audio.mp3", transcribe, edit
	2. Upload same "audio.mp3" again
	3. Verify: Previous edits cleared (fresh start)

	### ✅ Test 6: Timeline Visualization
	1. Upload audio, transcribe (timeline segments appear)
	2. Upload different audio
	3. Verify: Timeline segments cleared (empty)
	4. Transcribe new audio
	5. Verify: New timeline segments appear

	## Performance Considerations
	- resetCompleteSession(): O(1) - fast state/DOM clearing
	- Called only on source change: Infrequent user action
	- Impact: Negligible (<1ms)

	## Backward Compatibility
	- ✅ Existing `resetTranscriptionState()` unchanged
	- ✅ New function adds capability, doesn't break existing code
	- ✅ No API changes required
	- ✅ No breaking changes to user workflow

	## Implementation Checklist

	- [ ] Create `resetCompleteSession()` function
	- [ ] Update `handleFileUpload()` to call reset
	- [ ] Update `handleYoutubeFetch()` to call reset
	- [ ] Update `downloadEpisode()` to call reset
	- [ ] Test all source change scenarios
	- [ ] Verify UI elements cleared
	- [ ] Verify no data contamination between sessions
	- [ ] Update documentation
	- [ ] Commit changes

	## Related Bugs
	- Bug 2.4.1: Manual speaker name propagation (Fixed)
	- Bug 2.4.2: Auto-detection UI update (Fixed)
	- Bug 2.4.3: Clear name to enable detection (Fixed)
	- Bug 2.4.4: State persistence across audio files (This bug)

	## Files to Modify

	### `/home/luigi/VoxSum/frontend/app.js`
	- New Function: `resetCompleteSession()` (after line 273)
	- Modify: `handleFileUpload()` (line ~1122)
	- Modify: `handleYoutubeFetch()` (line ~1141)
	- Modify: `downloadEpisode()` (line ~1239)
	- Impact: ~40 lines added/modified

	## Conclusion
	The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive `resetCompleteSession()` function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.