Spaces:

Luigi
/

VoxSum

Sleeping

App Files Files Community

VoxSum / STATE_PERSISTENCE_BUG_FIX.md

Luigi

fix: reset session data when loading new audio source

0bfe5ff about 1 month ago

preview code

raw

history blame contribute delete

16.7 kB

State Persistence Bug - Analysis and Fix

Date

October 1, 2025

Overview

Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast).

Problem Statement (Bug 2.4.4)

User Story

As a user, when I:

Load an audio file and transcribe it
Edit/detect speaker names (e.g., "Alice", "Bob")
Generate summary and title
Load a DIFFERENT audio file (upload, YouTube, podcast)
Expected: Clean slate - no speaker names, no summary, no title
Actual: Previous speaker names, summary, and title still visible

Visual Example

Scenario:

Step 1: Load "podcast_interview.mp3"
  - Transcribe → 2 speakers detected
  - Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob"
  - Generate summary: "Interview about AI..."
  - Title: "AI Discussion with Alice"

Step 2: Load "meeting_recording.mp3" (different audio)
  - Audio player shows new file ✓
  - Transcript: EMPTY (not yet transcribed) ✓
  - Speaker names: Still shows "Alice", "Bob" from previous file ✗
  - Summary: Still shows "Interview about AI..." ✗
  - Title: Still shows "AI Discussion with Alice" ✗

Step 3: Transcribe new audio
  - New transcript appears with 3 speakers
  - Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) ✗
  - Summary: Still old summary ✗

Impact

Confusion: Users see speaker names from different audio files
Data Integrity: Mixed data from multiple sessions
Trust Issue: Users can't trust the displayed information
UX Problem: Must manually clear/reset before each new file

Root Cause Analysis

Current State Management

State Object:

const state = {
  config: { moonshine: {}, sensevoice: {}, llms: {} },
  backend: 'sensevoice',
  utterances: [],
  diarizedUtterances: null,
  diarizationStats: null,
  speakerNames: {},        // ❌ NOT reset when source changes
  summary: '',             // ❌ NOT reset when source changes
  title: '',               // ❌ NOT reset when source changes
  audioUrl: null,
  sourcePath: null,
  uploadedFile: null,
  transcribing: false,
  summarizing: false,
  detectingSpeakerNames: false,
  transcriptionController: null,
  summaryController: null,
};

Existing Reset Function

Location: frontend/app.js:resetTranscriptionState() (lines 265-273)

function resetTranscriptionState() {
  state.utterances = [];
  state.diarizedUtterances = null;
  state.diarizationStats = null;
  activeUtteranceIndex = -1;
  elements.transcriptList.innerHTML = '';
  elements.utteranceCount.textContent = '';
  elements.diarizationPanel.classList.add('hidden');
  // ❌ MISSING: state.speakerNames = {};
  // ❌ MISSING: state.summary = '';
  // ❌ MISSING: state.title = '';
  // ❌ MISSING: Clear summary/title UI elements
}

Called only by: handleTranscription() (line 302)

Source Change Functions

Function 1: `handleFileUpload()` (lines 1119-1127)

function handleFileUpload(event) {
  const file = event.target.files?.[0];
  if (!file) return;
  state.uploadedFile = file;
  state.audioUrl = null;
  const objectUrl = URL.createObjectURL(file);
  elements.audioPlayer.src = objectUrl;
  setStatus(`Loaded ${file.name}`, 'info');
  // ❌ MISSING: No call to reset state
}

Function 2: `handleYoutubeFetch()` (lines 1129-1147)

async function handleYoutubeFetch() {
  // ... fetch logic ...
  state.audioUrl = data.audioUrl;
  state.uploadedFile = null;
  elements.audioPlayer.src = data.audioUrl;
  setStatus('YouTube audio ready', 'success');
  // ❌ MISSING: No call to reset state
}

Function 3: `downloadEpisode()` (lines 1226-1258)

async function downloadEpisode(audioUrl, title, triggerButton = null) {
  // ... download logic ...
  state.audioUrl = data.audioUrl;
  state.uploadedFile = null;
  elements.audioPlayer.src = data.audioUrl;
  setStatus('Episode ready', 'success');
  // ❌ MISSING: No call to reset state
}

Why It Happens

Problem Flow:

1. User loads Audio A
   → state.speakerNames, summary, title are empty

2. User transcribes Audio A
   → resetTranscriptionState() called (clears transcript, but NOT speaker names)
   → Transcription creates new utterances
   → state.speakerNames gets populated

3. User edits speaker names, generates summary
   → state.speakerNames = { 0: "Alice", 1: "Bob" }
   → state.summary = "Interview..."
   → state.title = "AI Discussion"

4. User loads Audio B (via upload, YouTube, or podcast)
   → handleFileUpload/handleYoutubeFetch/downloadEpisode called
   → Audio player source changed ✓
   → state.audioUrl/uploadedFile updated ✓
   → BUT state.speakerNames, summary, title NOT cleared ✗

5. User transcribes Audio B
   → resetTranscriptionState() called
   → Clears utterances, diarization stats ✓
   → BUT does NOT clear speakerNames, summary, title ✗
   → New transcription with old speaker names appears!

Solution Design

Design Principles

Complete Reset: Clear ALL session-specific data when source changes
Clear Intent: Reset should happen immediately when new source loaded
Separation of Concerns:
- Transcription reset: Clear transcription-related data
- Session reset: Clear ALL session data including summary, title, speaker names
Consistent Behavior: Same reset logic for all source types (upload, YouTube, podcast)

Two-Level Reset Strategy

Level 1: Reset Transcription Data (Existing)

When: Before starting new transcription
What: Utterances, diarization stats, transcript UI

function resetTranscriptionState() {
  state.utterances = [];
  state.diarizedUtterances = null;
  state.diarizationStats = null;
  activeUtteranceIndex = -1;
  elements.transcriptList.innerHTML = '';
  elements.utteranceCount.textContent = '';
  elements.diarizationPanel.classList.add('hidden');
}

Level 2: Reset Complete Session (NEW)

When: When new audio source is loaded
What: Everything from Level 1 + speaker names + summary + title

function resetCompleteSession() {
  // Level 1: Reset transcription data
  resetTranscriptionState();
  
  // Level 2: Reset speaker names
  state.speakerNames = {};
  
  // Level 3: Reset summary and title
  state.summary = '';
  state.title = '';
  elements.summaryOutput.innerHTML = '';
  elements.titleOutput.textContent = '';
  
  // Level 4: Reset timeline segments
  renderTimelineSegments();  // Will be empty with no utterances
  
  // Optional: Hide detect speaker names button
  elements.detectSpeakerNamesBtn.classList.add('hidden');
}

Implementation

Change 1: Create `resetCompleteSession()` Function

File: frontend/app.js (after resetTranscriptionState())

function resetCompleteSession() {
  // Reset transcription data
  resetTranscriptionState();
  
  // Reset speaker names
  state.speakerNames = {};
  
  // Reset summary and title
  state.summary = '';
  state.title = '';
  
  // Clear summary and title UI
  elements.summaryOutput.innerHTML = '';
  elements.titleOutput.textContent = '';
  
  // Reset timeline visualization
  renderTimelineSegments();
  
  // Hide speaker name detection button
  elements.detectSpeakerNamesBtn.classList.add('hidden');
  
  // Reset status
  setStatus('Ready for new transcription', 'info');
}

Change 2: Call Reset on File Upload

File: frontend/app.js:handleFileUpload() (lines ~1119-1127)

function handleFileUpload(event) {
  const file = event.target.files?.[0];
  if (!file) return;
  
  // Reset complete session when new file loaded
  resetCompleteSession();
  
  state.uploadedFile = file;
  state.audioUrl = null;
  const objectUrl = URL.createObjectURL(file);
  elements.audioPlayer.src = objectUrl;
  setStatus(`Loaded ${file.name}`, 'info');
}

Change 3: Call Reset on YouTube Fetch

File: frontend/app.js:handleYoutubeFetch() (lines ~1129-1147)

async function handleYoutubeFetch() {
  if (!elements.youtubeUrl.value.trim()) return;
  setStatus('Downloading audio from YouTube...', 'info');
  try {
    const res = await fetch('/api/youtube/fetch', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }),
    });
    if (!res.ok) throw new Error('YouTube download failed');
    const data = await res.json();
    
    // Reset complete session when new YouTube audio loaded
    resetCompleteSession();
    
    state.audioUrl = data.audioUrl;
    state.uploadedFile = null;
    elements.audioPlayer.src = data.audioUrl;
    setStatus('YouTube audio ready', 'success');
  } catch (err) {
    console.error(err);
    setStatus(err.message, 'error');
  }
}

Change 4: Call Reset on Podcast Episode Download

File: frontend/app.js:downloadEpisode() (lines ~1226-1258)

async function downloadEpisode(audioUrl, title, triggerButton = null) {
  setStatus('Downloading episode...', 'info');
  let originalLabel = null;
  if (triggerButton) {
    originalLabel = triggerButton.innerHTML;
    triggerButton.disabled = true;
    triggerButton.classList.add('loading');
    triggerButton.textContent = 'Downloading…';
  }
  try {
    const res = await fetch('/api/podcast/download', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ audioUrl, title }),
    });
    if (!res.ok) throw new Error('Episode download failed');
    const data = await res.json();
    
    // Reset complete session when new episode loaded
    resetCompleteSession();
    
    state.audioUrl = data.audioUrl;
    state.uploadedFile = null;
    elements.audioPlayer.src = data.audioUrl;
    setStatus('Episode ready', 'success');
    // ... rest of the function
  } catch (err) {
    // ... error handling
  }
}

Behavior After Fix

Example Scenario

Step 1: Load and Process First Audio

1. Upload "interview.mp3"
   → resetCompleteSession() called
   → Clean slate: no utterances, speaker names, summary, title

2. Transcribe
   → resetTranscriptionState() called (redundant but harmless)
   → Transcript appears, 2 speakers detected

3. Edit speaker names
   → state.speakerNames = { 0: "Alice", 1: "Bob" }

4. Generate summary
   → state.summary = "Interview about AI..."
   → state.title = "AI Discussion"

Step 2: Load Different Audio

1. Upload "meeting.mp3"
   → resetCompleteSession() called ✓
   → state.speakerNames = {} (cleared) ✓
   → state.summary = '' (cleared) ✓
   → state.title = '' (cleared) ✓
   → Summary UI cleared ✓
   → Title UI cleared ✓
   → Timeline cleared ✓
   → Status: "Loaded meeting.mp3"

2. Transcribe
   → Fresh transcript with 3 speakers
   → Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" ✓
   → No contamination from previous audio ✓

Step 3: Generate New Summary

1. Click "Generate Summary"
   → New summary generated for current audio ✓
   → Replaces old summary (already cleared) ✓
   → New title generated ✓

Edge Cases

Edge Case 1: Upload Same File Twice

1. Upload "audio.mp3"
   → resetCompleteSession() called
2. Transcribe and edit
3. Upload same "audio.mp3" again
   → resetCompleteSession() called (data cleared)
   → User must transcribe again
   
Decision: Acceptable - user explicitly chose to reload

Edge Case 2: Change Source During Transcription

1. Start transcription of "audio1.mp3"
2. Mid-transcription, upload "audio2.mp3"
   → resetCompleteSession() called
   → Partial transcription cleared
   → New audio loaded

Decision: Acceptable - user action indicates intent to switch
Note: Transcription abort handling already exists

Edge Case 3: YouTube Fetch While Audio Playing

1. Upload file, play audio
2. Fetch YouTube audio
   → resetCompleteSession() called
   → Audio player source changed
   → Playback stops (normal behavior)

Decision: Acceptable - expected behavior when changing source

Edge Case 4: Multiple Podcast Episodes in Sequence

1. Download episode 1
   → resetCompleteSession()
2. Transcribe episode 1
3. Download episode 2
   → resetCompleteSession() (episode 1 data cleared)
4. Transcribe episode 2

Decision: Correct behavior - each episode is independent

UI Elements to Reset

Complete Checklist

State Variables:

state.utterances (via resetTranscriptionState)
state.diarizedUtterances (via resetTranscriptionState)
state.diarizationStats (via resetTranscriptionState)
state.speakerNames (NEW)
state.summary (NEW)
state.title (NEW)
activeUtteranceIndex (via resetTranscriptionState)

DOM Elements:

elements.transcriptList (via resetTranscriptionState)
elements.utteranceCount (via resetTranscriptionState)
elements.diarizationPanel (via resetTranscriptionState)
elements.diarizationMetrics (via renderDiarizationStats after reset)
elements.speakerBreakdown (via renderDiarizationStats after reset)
elements.summaryOutput (NEW)
elements.titleOutput (NEW)
elements.timelineSegments (via renderTimelineSegments)
elements.detectSpeakerNamesBtn visibility (NEW)

Testing Scenarios

✅ Test 1: Upload → Edit → Upload New

Upload "audio1.mp3"
Transcribe, edit speaker names to "Alice", "Bob"
Generate summary "Summary 1"
Upload "audio2.mp3"
Verify: Speaker names cleared, summary cleared, title cleared
Transcribe
Verify: Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob)

✅ Test 2: YouTube → Summary → Podcast

Fetch YouTube audio
Transcribe, generate summary
Download podcast episode
Verify: YouTube summary cleared
Transcribe podcast
Verify: Independent transcript and summary

✅ Test 3: Podcast → Names → YouTube

Download podcast
Transcribe, detect speaker names
Fetch YouTube audio
Verify: Podcast speaker names cleared
Transcribe YouTube
Verify: No podcast names visible

✅ Test 4: Rapid Source Changes

Upload file
Immediately fetch YouTube (before transcription)
Verify: File data cleared, YouTube ready
Immediately download podcast
Verify: YouTube data cleared, podcast ready

✅ Test 5: Same Source Reload

Upload "audio.mp3", transcribe, edit
Upload same "audio.mp3" again
Verify: Previous edits cleared (fresh start)

✅ Test 6: Timeline Visualization

Upload audio, transcribe (timeline segments appear)
Upload different audio
Verify: Timeline segments cleared (empty)
Transcribe new audio
Verify: New timeline segments appear

Performance Considerations

resetCompleteSession(): O(1) - fast state/DOM clearing
Called only on source change: Infrequent user action
Impact: Negligible (<1ms)

Backward Compatibility

✅ Existing resetTranscriptionState() unchanged
✅ New function adds capability, doesn't break existing code
✅ No API changes required
✅ No breaking changes to user workflow

Implementation Checklist

Create resetCompleteSession() function
Update handleFileUpload() to call reset
Update handleYoutubeFetch() to call reset
Update downloadEpisode() to call reset
Test all source change scenarios
Verify UI elements cleared
Verify no data contamination between sessions
Update documentation
Commit changes

Related Bugs

Bug 2.4.1: Manual speaker name propagation (Fixed)
Bug 2.4.2: Auto-detection UI update (Fixed)
Bug 2.4.3: Clear name to enable detection (Fixed)
Bug 2.4.4: State persistence across audio files (This bug)

Files to Modify

`/home/luigi/VoxSum/frontend/app.js`

New Function: resetCompleteSession() (after line 273)
Modify: handleFileUpload() (line ~1122)
Modify: handleYoutubeFetch() (line ~1141)
Modify: downloadEpisode() (line ~1239)
Impact: ~40 lines added/modified

Conclusion

The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive resetCompleteSession() function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.