VoxSum / STATE_PERSISTENCE_BUG_FIX.md
Luigi's picture
fix: reset session data when loading new audio source
0bfe5ff

State Persistence Bug - Analysis and Fix

Date

October 1, 2025

Overview

Bug where speaker names, summary, and title from the first audio file persist and incorrectly display when loading a different audio source (file upload, YouTube, or podcast).

Problem Statement (Bug 2.4.4)

User Story

As a user, when I:

  1. Load an audio file and transcribe it
  2. Edit/detect speaker names (e.g., "Alice", "Bob")
  3. Generate summary and title
  4. Load a DIFFERENT audio file (upload, YouTube, podcast)
  5. Expected: Clean slate - no speaker names, no summary, no title
  6. Actual: Previous speaker names, summary, and title still visible

Visual Example

Scenario:

Step 1: Load "podcast_interview.mp3"
  - Transcribe β†’ 2 speakers detected
  - Edit names: Speaker 0 = "Alice", Speaker 1 = "Bob"
  - Generate summary: "Interview about AI..."
  - Title: "AI Discussion with Alice"

Step 2: Load "meeting_recording.mp3" (different audio)
  - Audio player shows new file βœ“
  - Transcript: EMPTY (not yet transcribed) βœ“
  - Speaker names: Still shows "Alice", "Bob" from previous file βœ—
  - Summary: Still shows "Interview about AI..." βœ—
  - Title: Still shows "AI Discussion with Alice" βœ—

Step 3: Transcribe new audio
  - New transcript appears with 3 speakers
  - Tags show: "Alice", "Bob", "Speaker 3" (mixed old/new!) βœ—
  - Summary: Still old summary βœ—

Impact

  • Confusion: Users see speaker names from different audio files
  • Data Integrity: Mixed data from multiple sessions
  • Trust Issue: Users can't trust the displayed information
  • UX Problem: Must manually clear/reset before each new file

Root Cause Analysis

Current State Management

State Object:

const state = {
  config: { moonshine: {}, sensevoice: {}, llms: {} },
  backend: 'sensevoice',
  utterances: [],
  diarizedUtterances: null,
  diarizationStats: null,
  speakerNames: {},        // ❌ NOT reset when source changes
  summary: '',             // ❌ NOT reset when source changes
  title: '',               // ❌ NOT reset when source changes
  audioUrl: null,
  sourcePath: null,
  uploadedFile: null,
  transcribing: false,
  summarizing: false,
  detectingSpeakerNames: false,
  transcriptionController: null,
  summaryController: null,
};

Existing Reset Function

Location: frontend/app.js:resetTranscriptionState() (lines 265-273)

function resetTranscriptionState() {
  state.utterances = [];
  state.diarizedUtterances = null;
  state.diarizationStats = null;
  activeUtteranceIndex = -1;
  elements.transcriptList.innerHTML = '';
  elements.utteranceCount.textContent = '';
  elements.diarizationPanel.classList.add('hidden');
  // ❌ MISSING: state.speakerNames = {};
  // ❌ MISSING: state.summary = '';
  // ❌ MISSING: state.title = '';
  // ❌ MISSING: Clear summary/title UI elements
}

Called only by: handleTranscription() (line 302)

Source Change Functions

Function 1: handleFileUpload() (lines 1119-1127)

function handleFileUpload(event) {
  const file = event.target.files?.[0];
  if (!file) return;
  state.uploadedFile = file;
  state.audioUrl = null;
  const objectUrl = URL.createObjectURL(file);
  elements.audioPlayer.src = objectUrl;
  setStatus(`Loaded ${file.name}`, 'info');
  // ❌ MISSING: No call to reset state
}

Function 2: handleYoutubeFetch() (lines 1129-1147)

async function handleYoutubeFetch() {
  // ... fetch logic ...
  state.audioUrl = data.audioUrl;
  state.uploadedFile = null;
  elements.audioPlayer.src = data.audioUrl;
  setStatus('YouTube audio ready', 'success');
  // ❌ MISSING: No call to reset state
}

Function 3: downloadEpisode() (lines 1226-1258)

async function downloadEpisode(audioUrl, title, triggerButton = null) {
  // ... download logic ...
  state.audioUrl = data.audioUrl;
  state.uploadedFile = null;
  elements.audioPlayer.src = data.audioUrl;
  setStatus('Episode ready', 'success');
  // ❌ MISSING: No call to reset state
}

Why It Happens

Problem Flow:

1. User loads Audio A
   β†’ state.speakerNames, summary, title are empty

2. User transcribes Audio A
   β†’ resetTranscriptionState() called (clears transcript, but NOT speaker names)
   β†’ Transcription creates new utterances
   β†’ state.speakerNames gets populated

3. User edits speaker names, generates summary
   β†’ state.speakerNames = { 0: "Alice", 1: "Bob" }
   β†’ state.summary = "Interview..."
   β†’ state.title = "AI Discussion"

4. User loads Audio B (via upload, YouTube, or podcast)
   β†’ handleFileUpload/handleYoutubeFetch/downloadEpisode called
   β†’ Audio player source changed βœ“
   β†’ state.audioUrl/uploadedFile updated βœ“
   β†’ BUT state.speakerNames, summary, title NOT cleared βœ—

5. User transcribes Audio B
   β†’ resetTranscriptionState() called
   β†’ Clears utterances, diarization stats βœ“
   β†’ BUT does NOT clear speakerNames, summary, title βœ—
   β†’ New transcription with old speaker names appears!

Solution Design

Design Principles

  1. Complete Reset: Clear ALL session-specific data when source changes
  2. Clear Intent: Reset should happen immediately when new source loaded
  3. Separation of Concerns:
    • Transcription reset: Clear transcription-related data
    • Session reset: Clear ALL session data including summary, title, speaker names
  4. Consistent Behavior: Same reset logic for all source types (upload, YouTube, podcast)

Two-Level Reset Strategy

Level 1: Reset Transcription Data (Existing)

When: Before starting new transcription
What: Utterances, diarization stats, transcript UI

function resetTranscriptionState() {
  state.utterances = [];
  state.diarizedUtterances = null;
  state.diarizationStats = null;
  activeUtteranceIndex = -1;
  elements.transcriptList.innerHTML = '';
  elements.utteranceCount.textContent = '';
  elements.diarizationPanel.classList.add('hidden');
}

Level 2: Reset Complete Session (NEW)

When: When new audio source is loaded
What: Everything from Level 1 + speaker names + summary + title

function resetCompleteSession() {
  // Level 1: Reset transcription data
  resetTranscriptionState();
  
  // Level 2: Reset speaker names
  state.speakerNames = {};
  
  // Level 3: Reset summary and title
  state.summary = '';
  state.title = '';
  elements.summaryOutput.innerHTML = '';
  elements.titleOutput.textContent = '';
  
  // Level 4: Reset timeline segments
  renderTimelineSegments();  // Will be empty with no utterances
  
  // Optional: Hide detect speaker names button
  elements.detectSpeakerNamesBtn.classList.add('hidden');
}

Implementation

Change 1: Create resetCompleteSession() Function

File: frontend/app.js (after resetTranscriptionState())

function resetCompleteSession() {
  // Reset transcription data
  resetTranscriptionState();
  
  // Reset speaker names
  state.speakerNames = {};
  
  // Reset summary and title
  state.summary = '';
  state.title = '';
  
  // Clear summary and title UI
  elements.summaryOutput.innerHTML = '';
  elements.titleOutput.textContent = '';
  
  // Reset timeline visualization
  renderTimelineSegments();
  
  // Hide speaker name detection button
  elements.detectSpeakerNamesBtn.classList.add('hidden');
  
  // Reset status
  setStatus('Ready for new transcription', 'info');
}

Change 2: Call Reset on File Upload

File: frontend/app.js:handleFileUpload() (lines ~1119-1127)

function handleFileUpload(event) {
  const file = event.target.files?.[0];
  if (!file) return;
  
  // Reset complete session when new file loaded
  resetCompleteSession();
  
  state.uploadedFile = file;
  state.audioUrl = null;
  const objectUrl = URL.createObjectURL(file);
  elements.audioPlayer.src = objectUrl;
  setStatus(`Loaded ${file.name}`, 'info');
}

Change 3: Call Reset on YouTube Fetch

File: frontend/app.js:handleYoutubeFetch() (lines ~1129-1147)

async function handleYoutubeFetch() {
  if (!elements.youtubeUrl.value.trim()) return;
  setStatus('Downloading audio from YouTube...', 'info');
  try {
    const res = await fetch('/api/youtube/fetch', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ url: elements.youtubeUrl.value.trim() }),
    });
    if (!res.ok) throw new Error('YouTube download failed');
    const data = await res.json();
    
    // Reset complete session when new YouTube audio loaded
    resetCompleteSession();
    
    state.audioUrl = data.audioUrl;
    state.uploadedFile = null;
    elements.audioPlayer.src = data.audioUrl;
    setStatus('YouTube audio ready', 'success');
  } catch (err) {
    console.error(err);
    setStatus(err.message, 'error');
  }
}

Change 4: Call Reset on Podcast Episode Download

File: frontend/app.js:downloadEpisode() (lines ~1226-1258)

async function downloadEpisode(audioUrl, title, triggerButton = null) {
  setStatus('Downloading episode...', 'info');
  let originalLabel = null;
  if (triggerButton) {
    originalLabel = triggerButton.innerHTML;
    triggerButton.disabled = true;
    triggerButton.classList.add('loading');
    triggerButton.textContent = 'Downloading…';
  }
  try {
    const res = await fetch('/api/podcast/download', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ audioUrl, title }),
    });
    if (!res.ok) throw new Error('Episode download failed');
    const data = await res.json();
    
    // Reset complete session when new episode loaded
    resetCompleteSession();
    
    state.audioUrl = data.audioUrl;
    state.uploadedFile = null;
    elements.audioPlayer.src = data.audioUrl;
    setStatus('Episode ready', 'success');
    // ... rest of the function
  } catch (err) {
    // ... error handling
  }
}

Behavior After Fix

Example Scenario

Step 1: Load and Process First Audio

1. Upload "interview.mp3"
   β†’ resetCompleteSession() called
   β†’ Clean slate: no utterances, speaker names, summary, title

2. Transcribe
   β†’ resetTranscriptionState() called (redundant but harmless)
   β†’ Transcript appears, 2 speakers detected

3. Edit speaker names
   β†’ state.speakerNames = { 0: "Alice", 1: "Bob" }

4. Generate summary
   β†’ state.summary = "Interview about AI..."
   β†’ state.title = "AI Discussion"

Step 2: Load Different Audio

1. Upload "meeting.mp3"
   β†’ resetCompleteSession() called βœ“
   β†’ state.speakerNames = {} (cleared) βœ“
   β†’ state.summary = '' (cleared) βœ“
   β†’ state.title = '' (cleared) βœ“
   β†’ Summary UI cleared βœ“
   β†’ Title UI cleared βœ“
   β†’ Timeline cleared βœ“
   β†’ Status: "Loaded meeting.mp3"

2. Transcribe
   β†’ Fresh transcript with 3 speakers
   β†’ Speaker tags show: "Speaker 1", "Speaker 2", "Speaker 3" βœ“
   β†’ No contamination from previous audio βœ“

Step 3: Generate New Summary

1. Click "Generate Summary"
   β†’ New summary generated for current audio βœ“
   β†’ Replaces old summary (already cleared) βœ“
   β†’ New title generated βœ“

Edge Cases

Edge Case 1: Upload Same File Twice

1. Upload "audio.mp3"
   β†’ resetCompleteSession() called
2. Transcribe and edit
3. Upload same "audio.mp3" again
   β†’ resetCompleteSession() called (data cleared)
   β†’ User must transcribe again
   
Decision: Acceptable - user explicitly chose to reload

Edge Case 2: Change Source During Transcription

1. Start transcription of "audio1.mp3"
2. Mid-transcription, upload "audio2.mp3"
   β†’ resetCompleteSession() called
   β†’ Partial transcription cleared
   β†’ New audio loaded

Decision: Acceptable - user action indicates intent to switch
Note: Transcription abort handling already exists

Edge Case 3: YouTube Fetch While Audio Playing

1. Upload file, play audio
2. Fetch YouTube audio
   β†’ resetCompleteSession() called
   β†’ Audio player source changed
   β†’ Playback stops (normal behavior)

Decision: Acceptable - expected behavior when changing source

Edge Case 4: Multiple Podcast Episodes in Sequence

1. Download episode 1
   β†’ resetCompleteSession()
2. Transcribe episode 1
3. Download episode 2
   β†’ resetCompleteSession() (episode 1 data cleared)
4. Transcribe episode 2

Decision: Correct behavior - each episode is independent

UI Elements to Reset

Complete Checklist

State Variables:

  • state.utterances (via resetTranscriptionState)
  • state.diarizedUtterances (via resetTranscriptionState)
  • state.diarizationStats (via resetTranscriptionState)
  • state.speakerNames (NEW)
  • state.summary (NEW)
  • state.title (NEW)
  • activeUtteranceIndex (via resetTranscriptionState)

DOM Elements:

  • elements.transcriptList (via resetTranscriptionState)
  • elements.utteranceCount (via resetTranscriptionState)
  • elements.diarizationPanel (via resetTranscriptionState)
  • elements.diarizationMetrics (via renderDiarizationStats after reset)
  • elements.speakerBreakdown (via renderDiarizationStats after reset)
  • elements.summaryOutput (NEW)
  • elements.titleOutput (NEW)
  • elements.timelineSegments (via renderTimelineSegments)
  • elements.detectSpeakerNamesBtn visibility (NEW)

Testing Scenarios

βœ… Test 1: Upload β†’ Edit β†’ Upload New

  1. Upload "audio1.mp3"
  2. Transcribe, edit speaker names to "Alice", "Bob"
  3. Generate summary "Summary 1"
  4. Upload "audio2.mp3"
  5. Verify: Speaker names cleared, summary cleared, title cleared
  6. Transcribe
  7. Verify: Speaker tags show "Speaker 1", "Speaker 2" (not Alice/Bob)

βœ… Test 2: YouTube β†’ Summary β†’ Podcast

  1. Fetch YouTube audio
  2. Transcribe, generate summary
  3. Download podcast episode
  4. Verify: YouTube summary cleared
  5. Transcribe podcast
  6. Verify: Independent transcript and summary

βœ… Test 3: Podcast β†’ Names β†’ YouTube

  1. Download podcast
  2. Transcribe, detect speaker names
  3. Fetch YouTube audio
  4. Verify: Podcast speaker names cleared
  5. Transcribe YouTube
  6. Verify: No podcast names visible

βœ… Test 4: Rapid Source Changes

  1. Upload file
  2. Immediately fetch YouTube (before transcription)
  3. Verify: File data cleared, YouTube ready
  4. Immediately download podcast
  5. Verify: YouTube data cleared, podcast ready

βœ… Test 5: Same Source Reload

  1. Upload "audio.mp3", transcribe, edit
  2. Upload same "audio.mp3" again
  3. Verify: Previous edits cleared (fresh start)

βœ… Test 6: Timeline Visualization

  1. Upload audio, transcribe (timeline segments appear)
  2. Upload different audio
  3. Verify: Timeline segments cleared (empty)
  4. Transcribe new audio
  5. Verify: New timeline segments appear

Performance Considerations

  • resetCompleteSession(): O(1) - fast state/DOM clearing
  • Called only on source change: Infrequent user action
  • Impact: Negligible (<1ms)

Backward Compatibility

  • βœ… Existing resetTranscriptionState() unchanged
  • βœ… New function adds capability, doesn't break existing code
  • βœ… No API changes required
  • βœ… No breaking changes to user workflow

Implementation Checklist

  • Create resetCompleteSession() function
  • Update handleFileUpload() to call reset
  • Update handleYoutubeFetch() to call reset
  • Update downloadEpisode() to call reset
  • Test all source change scenarios
  • Verify UI elements cleared
  • Verify no data contamination between sessions
  • Update documentation
  • Commit changes

Related Bugs

  • Bug 2.4.1: Manual speaker name propagation (Fixed)
  • Bug 2.4.2: Auto-detection UI update (Fixed)
  • Bug 2.4.3: Clear name to enable detection (Fixed)
  • Bug 2.4.4: State persistence across audio files (This bug)

Files to Modify

/home/luigi/VoxSum/frontend/app.js

  • New Function: resetCompleteSession() (after line 273)
  • Modify: handleFileUpload() (line ~1122)
  • Modify: handleYoutubeFetch() (line ~1141)
  • Modify: downloadEpisode() (line ~1239)
  • Impact: ~40 lines added/modified

Conclusion

The bug is caused by incomplete state reset when audio sources change. The solution is to create a comprehensive resetCompleteSession() function that clears ALL session data (transcription, speaker names, summary, title) and call it whenever a new audio source is loaded (file upload, YouTube, podcast). This ensures a clean slate for each audio file and prevents data contamination between sessions.