VoxSum / SPEAKER_NAME_CONFLICT_RESOLUTION.md
Luigi's picture
feat: allow users to clear speaker names to enable auto-detection
6e6157b

Speaker Name Conflict Resolution - Enhancement

Date

October 1, 2025

Overview

Enhancement to handle conflicts between user-edited speaker names and automatically detected names, allowing users to intentionally clear names to enable automatic detection to fill them again.

Current Behavior Analysis

Existing Merge Logic

Location: frontend/app.js:handleSpeakerNameDetection() (lines ~1030-1037)

// Merge detected names with existing user-edited names (preserve user edits)
const mergedNames = { ...speakerNames };
if (state.speakerNames) {
  Object.entries(state.speakerNames).forEach(([speakerId, info]) => {
    if (info.confidence === 'user') {
      // Preserve user-edited names
      mergedNames[speakerId] = info;
    }
  });
}

Current Logic:

  • βœ… Preserves ALL user-edited names (confidence === 'user')
  • βœ… Prevents auto-detection from overwriting user edits
  • ❌ Does NOT allow user to "reset" a name to enable auto-detection

Existing Edit Logic

Location: frontend/app.js:startSpeakerEdit() (lines ~665-680)

const finishEdit = (save = true) => {
  const newName = input.value.trim();
  if (save && newName) {
    // Update state with user-edited name
    state.speakerNames[speakerId] = {
      name: newName,
      confidence: 'user',
      reason: 'User edited'
    };
    // Force re-render...
  } else {
    // Restore original name (doesn't remove from state)
    const originalName = state.speakerNames?.[speakerId]?.name 
      || `Speaker ${speakerId + 1}`;
    speakerTag.textContent = originalName;
  }
};

Current Logic:

  • βœ… Saves non-empty names as user-edited
  • ❌ Empty input (after trim) restores original name, doesn't clear it
  • ❌ No way to "clear" a user-edited name to allow auto-detection

Problem Statement (Bug 2.4.3)

User Story

As a user, I want to:

  1. Manually edit speaker names when I know them
  2. Have my edits protected from auto-detection override
  3. Clear/reset a name to allow auto-detection to try again
  4. Have auto-detection only fill empty speaker names

Current Limitations

Scenario 1: User Cannot Clear Name

1. User edits: "Speaker 1" β†’ "John"
   state.speakerNames[0] = { name: "John", confidence: "user" }

2. User realizes this is wrong, tries to clear it
   - Edits tag, deletes all text, presses Enter
   - Expected: Name cleared, tag shows "Speaker 1"
   - Actual: Name restored to "John" (not cleared)

3. User clicks "Detect Speaker Names"
   - Expected: Auto-detection fills "Speaker 1"
   - Actual: "John" preserved (auto-detection blocked)

4. User is stuck with wrong name!

Scenario 2: No Way to Reset

User workflow:
1. Manually name speakers: 0="Alice", 1="Bob"
2. Later realize they're wrong
3. Want auto-detection to try again
4. No way to "reset" to allow auto-detection

Current workaround: None (would need to reload page or manually correct)

Solution Design

Design Principles

  1. Explicit Intent: Empty input should signal "clear this name"
  2. Selective Override: Auto-detection should only fill empty names
  3. User Control: User can always override auto-detection
  4. Clear Reset Path: User can clear name to enable auto-detection

State Management Strategy

Three States for Speaker Names

state.speakerNames[speakerId] = 
  // State 1: User-Edited (Protected)
  { name: "John", confidence: "user", reason: "User edited" }
  
  // State 2: Auto-Detected (Overridable)
  { name: "Alice", confidence: "high", reason: "Self-introduction" }
  
  // State 3: Cleared (Allows Auto-Detection)
  undefined  // Speaker removed from state.speakerNames

Key Decision: When user clears a name, remove it from state.speakerNames entirely.

Logic Flow

Flow 1: Manual Edit (Non-Empty)

User edits tag: "" or "Speaker 1" β†’ "John"
↓
Input validation: newName.trim() !== ""
↓
state.speakerNames[0] = { name: "John", confidence: "user" }
↓
Re-render UI: All tags show "John"
↓
Auto-detection: Skips speaker 0 (user-edited)

Flow 2: Manual Clear (Empty)

User edits tag: "John" β†’ "" (empty)
↓
Input validation: newName.trim() === ""
↓
delete state.speakerNames[0]  // Remove from state
↓
Re-render UI: Tags show "Speaker 1" (default)
↓
Auto-detection: Can fill speaker 0 (no longer user-edited)

Flow 3: Auto-Detection Merge

Auto-detection returns: { 0: {name: "Alice", ...}, 1: {name: "Bob", ...} }
↓
Merge logic checks each speaker:
  - Speaker 0: state.speakerNames[0] exists? 
    β†’ YES (confidence="user"): Keep "John", skip "Alice"
    β†’ NO: Use "Alice"
  - Speaker 1: state.speakerNames[1] exists?
    β†’ YES (confidence="user"): Keep user name
    β†’ NO: Use "Bob"
↓
Re-render UI with merged names

Implementation

Change 1: Update Manual Edit Logic

File: frontend/app.js:startSpeakerEdit() (lines ~665-680)

const finishEdit = (save = true) => {
  const newName = input.value.trim();
  
  if (save) {
    if (newName) {
      // Non-empty name: Save as user-edited
      if (!state.speakerNames) state.speakerNames = {};
      state.speakerNames[speakerId] = {
        name: newName,
        confidence: 'user',
        reason: 'User edited'
      };
    } else {
      // Empty name: Clear from state (allow auto-detection)
      if (state.speakerNames && state.speakerNames[speakerId]) {
        delete state.speakerNames[speakerId];
      }
    }
    // Force re-render to update all UI elements
    renderTranscript(true);
    renderTimelineSegments();
    renderDiarizationStats();
  } else {
    // Cancel edit: Restore current name
    const originalName = state.speakerNames?.[speakerId]?.name 
      || `Speaker ${speakerId + 1}`;
    speakerTag.textContent = originalName;
    speakerTag.classList.add('editable-speaker');
  }
};

Key Changes:

  • Added else branch for empty input
  • delete state.speakerNames[speakerId] removes from state
  • Triggers re-render even for empty input (to show default name)
  • Cancel (Escape) still restores original name without clearing

Change 2: Enhance Merge Logic (Already Correct!)

File: frontend/app.js:handleSpeakerNameDetection() (lines ~1030-1037)

// Current code is already correct!
const mergedNames = { ...speakerNames };
if (state.speakerNames) {
  Object.entries(state.speakerNames).forEach(([speakerId, info]) => {
    if (info.confidence === 'user') {
      // Preserve user-edited names
      mergedNames[speakerId] = info;
    }
  });
}

Why it's correct:

  • If speaker cleared: state.speakerNames[speakerId] is undefined
  • Loop skips undefined entries
  • Auto-detected name fills the gap βœ“

No changes needed here!

Behavior Examples

Example 1: Clear and Auto-Detect

Initial State:

state.speakerNames = {
  0: { name: "John", confidence: "user" },
  1: { name: "Alice", confidence: "user" }
}

Transcript:
[John]  Hello everyone...
[Alice] Hi there...
[John]  Today we'll discuss...

User Action 1: Clear Speaker 0

User clicks "John" tag, deletes all text, presses Enter

state.speakerNames = {
  1: { name: "Alice", confidence: "user" }
}
// Speaker 0 removed from state

Transcript:
[Speaker 1] Hello everyone...     ← Shows default
[Alice]     Hi there...           ← User-edited preserved
[Speaker 1] Today we'll discuss... ← Shows default

User Action 2: Click "Detect Speaker Names"

Auto-detection returns:
{ 0: {name: "Dr. Smith", confidence: "high"} }

Merge logic:
- Speaker 0: No user edit β†’ Use "Dr. Smith" βœ“
- Speaker 1: User edited β†’ Keep "Alice" βœ“

state.speakerNames = {
  0: { name: "Dr. Smith", confidence: "high" },
  1: { name: "Alice", confidence: "user" }
}

Transcript:
[Dr. Smith] Hello everyone...     ← Auto-detected
[Alice]     Hi there...           ← User-edited preserved
[Dr. Smith] Today we'll discuss... ← Auto-detected

Example 2: Edit, Clear, Edit Again

Initial State:

state.speakerNames = {}

Transcript:
[Speaker 1] Hello...
[Speaker 2] Hi...

Step 1: User edits

Edit Speaker 1 β†’ "Wrong Name"

state.speakerNames = {
  0: { name: "Wrong Name", confidence: "user" }
}

Transcript:
[Wrong Name] Hello...

Step 2: User realizes mistake, clears

Edit "Wrong Name" β†’ "" (empty)

state.speakerNames = {}  // Speaker 0 removed

Transcript:
[Speaker 1] Hello...  ← Back to default

Step 3: User edits correctly

Edit Speaker 1 β†’ "Correct Name"

state.speakerNames = {
  0: { name: "Correct Name", confidence: "user" }
}

Transcript:
[Correct Name] Hello...

Example 3: Cancel vs Clear

Scenario A: Cancel Edit (Escape key)

Tag shows "John"
User clicks tag, deletes text, presses Escape
β†’ Input cancelled, "John" restored
β†’ state.speakerNames unchanged

Scenario B: Clear Edit (Enter key)

Tag shows "John"
User clicks tag, deletes text, presses Enter
β†’ Edit saved with empty value
β†’ state.speakerNames[speakerId] deleted
β†’ Tag shows "Speaker 1" (default)

Edge Cases

Edge Case 1: Clear Non-Existent Name

Speaker has default name "Speaker 1" (not in state)
User clicks tag, clears (empty input), presses Enter

Check: state.speakerNames[0] exists?
β†’ NO: Nothing to delete
β†’ Result: No error, shows "Speaker 1" (unchanged)

Edge Case 2: Clear During Transcription

Live transcription in progress
User clears speaker name

Result:
- Name cleared from state βœ“
- Re-render triggered βœ“
- New utterances show default name βœ“
- Incremental rendering preserved βœ“

Edge Case 3: Clear All Names

User clears all speaker names

state.speakerNames = {}

Auto-detection:
- All speakers available for detection βœ“
- Can fill all empty names βœ“

Edge Case 4: Whitespace-Only Input

User enters "   " (spaces only)

Validation: "   ".trim() === ""
β†’ Treated as empty input
β†’ Name cleared from state βœ“

Testing Scenarios

βœ… Test 1: Clear User-Edited Name

  1. Edit "Speaker 1" β†’ "John"
  2. Verify: Tag shows "John"
  3. Edit "John" β†’ "" (empty)
  4. Verify: Tag shows "Speaker 1"
  5. Verify: state.speakerNames[0] is undefined

βœ… Test 2: Auto-Detection After Clear

  1. Edit "Speaker 1" β†’ "Wrong"
  2. Click "Detect Speaker Names"
  3. Verify: "Wrong" preserved (not overwritten)
  4. Clear "Wrong" β†’ ""
  5. Click "Detect Speaker Names" again
  6. Verify: Auto-detected name appears

βœ… Test 3: Cancel Does Not Clear

  1. Tag shows "John"
  2. Click tag, delete text
  3. Press Escape
  4. Verify: Tag shows "John" (restored)
  5. Verify: state.speakerNames[0] unchanged

βœ… Test 4: Empty Edit Triggers Re-Render

  1. Tag shows "John"
  2. Clear name β†’ ""
  3. Verify: All tags for speaker 0 show "Speaker 1"
  4. Verify: Timeline segments updated
  5. Verify: Stats panel updated

βœ… Test 5: Clear and Re-Edit

  1. Edit "Speaker 1" β†’ "First"
  2. Clear "First" β†’ ""
  3. Edit "Speaker 1" β†’ "Second"
  4. Verify: All tags show "Second"
  5. Verify: Protected from auto-detection

βœ… Test 6: Whitespace Handling

  1. Edit "Speaker 1" β†’ " " (spaces)
  2. Press Enter
  3. Verify: Treated as empty, name cleared
  4. Verify: Tag shows "Speaker 1"

User Experience Improvements

Visual Feedback

Consider adding visual hints:

/* Indicate clearable/editable state */
.speaker-tag.editable-speaker {
  cursor: text;
  border-style: dashed;  /* Hint: editable */
}

.speaker-tag.editable-speaker:hover::after {
  content: " ✎";  /* Pencil icon */
  opacity: 0.5;
}

Tooltip Enhancement

// In createUtteranceElement()
if (speakerInfo?.confidence === 'user') {
  speakerTag.title = 'User-edited name (click to edit or clear)';
} else if (speakerInfo?.confidence === 'high') {
  speakerTag.title = 'Auto-detected name (click to override)';
} else {
  speakerTag.title = 'Click to edit speaker name';
}

Clear Button (Optional)

Add explicit "Clear" button in edit mode:

<input class="speaker-edit-input" value="John" />
<button class="clear-speaker-btn" title="Clear and allow auto-detection">Γ—</button>

Implementation Checklist

  • Analyze current merge logic
  • Design clear/reset mechanism
  • Document three speaker name states
  • Implement empty input handling in startSpeakerEdit()
  • Test manual clear functionality
  • Test auto-detection after clear
  • Test cancel vs clear behavior
  • Verify timeline and stats panel sync
  • Update documentation
  • Commit changes

Files to Modify

/home/luigi/VoxSum/frontend/app.js

  • Function: startSpeakerEdit() (lines ~665-690)
  • Change: Add else branch for empty input to delete from state
  • Impact: ~10 lines modified

No Other Files Required

  • Merge logic already correct (no changes needed)
  • UI rendering already supports undefined names (shows default)

Performance Considerations

  • Delete operation: O(1) - fast
  • Re-render trigger: Same as edit (~10-50ms)
  • Memory: Reduces state size (removes cleared entries)

Backward Compatibility

  • βœ… Existing user-edited names preserved
  • βœ… Auto-detection logic unchanged
  • βœ… Default name display unchanged
  • βœ… No breaking changes to API

Conclusion

The enhancement allows users to intentionally clear speaker names to enable auto-detection, providing a clear "reset" path while maintaining protection for user edits. The implementation is simple (one else branch), robust, and maintains all existing functionality.