Speaker Name Conflict Resolution - Enhancement
Date
October 1, 2025
Overview
Enhancement to handle conflicts between user-edited speaker names and automatically detected names, allowing users to intentionally clear names to enable automatic detection to fill them again.
Current Behavior Analysis
Existing Merge Logic
Location: frontend/app.js:handleSpeakerNameDetection() (lines ~1030-1037)
// Merge detected names with existing user-edited names (preserve user edits)
const mergedNames = { ...speakerNames };
if (state.speakerNames) {
Object.entries(state.speakerNames).forEach(([speakerId, info]) => {
if (info.confidence === 'user') {
// Preserve user-edited names
mergedNames[speakerId] = info;
}
});
}
Current Logic:
- β Preserves ALL user-edited names (confidence === 'user')
- β Prevents auto-detection from overwriting user edits
- β Does NOT allow user to "reset" a name to enable auto-detection
Existing Edit Logic
Location: frontend/app.js:startSpeakerEdit() (lines ~665-680)
const finishEdit = (save = true) => {
const newName = input.value.trim();
if (save && newName) {
// Update state with user-edited name
state.speakerNames[speakerId] = {
name: newName,
confidence: 'user',
reason: 'User edited'
};
// Force re-render...
} else {
// Restore original name (doesn't remove from state)
const originalName = state.speakerNames?.[speakerId]?.name
|| `Speaker ${speakerId + 1}`;
speakerTag.textContent = originalName;
}
};
Current Logic:
- β Saves non-empty names as user-edited
- β Empty input (after trim) restores original name, doesn't clear it
- β No way to "clear" a user-edited name to allow auto-detection
Problem Statement (Bug 2.4.3)
User Story
As a user, I want to:
- Manually edit speaker names when I know them
- Have my edits protected from auto-detection override
- Clear/reset a name to allow auto-detection to try again
- Have auto-detection only fill empty speaker names
Current Limitations
Scenario 1: User Cannot Clear Name
1. User edits: "Speaker 1" β "John"
state.speakerNames[0] = { name: "John", confidence: "user" }
2. User realizes this is wrong, tries to clear it
- Edits tag, deletes all text, presses Enter
- Expected: Name cleared, tag shows "Speaker 1"
- Actual: Name restored to "John" (not cleared)
3. User clicks "Detect Speaker Names"
- Expected: Auto-detection fills "Speaker 1"
- Actual: "John" preserved (auto-detection blocked)
4. User is stuck with wrong name!
Scenario 2: No Way to Reset
User workflow:
1. Manually name speakers: 0="Alice", 1="Bob"
2. Later realize they're wrong
3. Want auto-detection to try again
4. No way to "reset" to allow auto-detection
Current workaround: None (would need to reload page or manually correct)
Solution Design
Design Principles
- Explicit Intent: Empty input should signal "clear this name"
- Selective Override: Auto-detection should only fill empty names
- User Control: User can always override auto-detection
- Clear Reset Path: User can clear name to enable auto-detection
State Management Strategy
Three States for Speaker Names
state.speakerNames[speakerId] =
// State 1: User-Edited (Protected)
{ name: "John", confidence: "user", reason: "User edited" }
// State 2: Auto-Detected (Overridable)
{ name: "Alice", confidence: "high", reason: "Self-introduction" }
// State 3: Cleared (Allows Auto-Detection)
undefined // Speaker removed from state.speakerNames
Key Decision: When user clears a name, remove it from state.speakerNames entirely.
Logic Flow
Flow 1: Manual Edit (Non-Empty)
User edits tag: "" or "Speaker 1" β "John"
β
Input validation: newName.trim() !== ""
β
state.speakerNames[0] = { name: "John", confidence: "user" }
β
Re-render UI: All tags show "John"
β
Auto-detection: Skips speaker 0 (user-edited)
Flow 2: Manual Clear (Empty)
User edits tag: "John" β "" (empty)
β
Input validation: newName.trim() === ""
β
delete state.speakerNames[0] // Remove from state
β
Re-render UI: Tags show "Speaker 1" (default)
β
Auto-detection: Can fill speaker 0 (no longer user-edited)
Flow 3: Auto-Detection Merge
Auto-detection returns: { 0: {name: "Alice", ...}, 1: {name: "Bob", ...} }
β
Merge logic checks each speaker:
- Speaker 0: state.speakerNames[0] exists?
β YES (confidence="user"): Keep "John", skip "Alice"
β NO: Use "Alice"
- Speaker 1: state.speakerNames[1] exists?
β YES (confidence="user"): Keep user name
β NO: Use "Bob"
β
Re-render UI with merged names
Implementation
Change 1: Update Manual Edit Logic
File: frontend/app.js:startSpeakerEdit() (lines ~665-680)
const finishEdit = (save = true) => {
const newName = input.value.trim();
if (save) {
if (newName) {
// Non-empty name: Save as user-edited
if (!state.speakerNames) state.speakerNames = {};
state.speakerNames[speakerId] = {
name: newName,
confidence: 'user',
reason: 'User edited'
};
} else {
// Empty name: Clear from state (allow auto-detection)
if (state.speakerNames && state.speakerNames[speakerId]) {
delete state.speakerNames[speakerId];
}
}
// Force re-render to update all UI elements
renderTranscript(true);
renderTimelineSegments();
renderDiarizationStats();
} else {
// Cancel edit: Restore current name
const originalName = state.speakerNames?.[speakerId]?.name
|| `Speaker ${speakerId + 1}`;
speakerTag.textContent = originalName;
speakerTag.classList.add('editable-speaker');
}
};
Key Changes:
- Added
elsebranch for empty input delete state.speakerNames[speakerId]removes from state- Triggers re-render even for empty input (to show default name)
- Cancel (Escape) still restores original name without clearing
Change 2: Enhance Merge Logic (Already Correct!)
File: frontend/app.js:handleSpeakerNameDetection() (lines ~1030-1037)
// Current code is already correct!
const mergedNames = { ...speakerNames };
if (state.speakerNames) {
Object.entries(state.speakerNames).forEach(([speakerId, info]) => {
if (info.confidence === 'user') {
// Preserve user-edited names
mergedNames[speakerId] = info;
}
});
}
Why it's correct:
- If speaker cleared:
state.speakerNames[speakerId]is undefined - Loop skips undefined entries
- Auto-detected name fills the gap β
No changes needed here!
Behavior Examples
Example 1: Clear and Auto-Detect
Initial State:
state.speakerNames = {
0: { name: "John", confidence: "user" },
1: { name: "Alice", confidence: "user" }
}
Transcript:
[John] Hello everyone...
[Alice] Hi there...
[John] Today we'll discuss...
User Action 1: Clear Speaker 0
User clicks "John" tag, deletes all text, presses Enter
state.speakerNames = {
1: { name: "Alice", confidence: "user" }
}
// Speaker 0 removed from state
Transcript:
[Speaker 1] Hello everyone... β Shows default
[Alice] Hi there... β User-edited preserved
[Speaker 1] Today we'll discuss... β Shows default
User Action 2: Click "Detect Speaker Names"
Auto-detection returns:
{ 0: {name: "Dr. Smith", confidence: "high"} }
Merge logic:
- Speaker 0: No user edit β Use "Dr. Smith" β
- Speaker 1: User edited β Keep "Alice" β
state.speakerNames = {
0: { name: "Dr. Smith", confidence: "high" },
1: { name: "Alice", confidence: "user" }
}
Transcript:
[Dr. Smith] Hello everyone... β Auto-detected
[Alice] Hi there... β User-edited preserved
[Dr. Smith] Today we'll discuss... β Auto-detected
Example 2: Edit, Clear, Edit Again
Initial State:
state.speakerNames = {}
Transcript:
[Speaker 1] Hello...
[Speaker 2] Hi...
Step 1: User edits
Edit Speaker 1 β "Wrong Name"
state.speakerNames = {
0: { name: "Wrong Name", confidence: "user" }
}
Transcript:
[Wrong Name] Hello...
Step 2: User realizes mistake, clears
Edit "Wrong Name" β "" (empty)
state.speakerNames = {} // Speaker 0 removed
Transcript:
[Speaker 1] Hello... β Back to default
Step 3: User edits correctly
Edit Speaker 1 β "Correct Name"
state.speakerNames = {
0: { name: "Correct Name", confidence: "user" }
}
Transcript:
[Correct Name] Hello...
Example 3: Cancel vs Clear
Scenario A: Cancel Edit (Escape key)
Tag shows "John"
User clicks tag, deletes text, presses Escape
β Input cancelled, "John" restored
β state.speakerNames unchanged
Scenario B: Clear Edit (Enter key)
Tag shows "John"
User clicks tag, deletes text, presses Enter
β Edit saved with empty value
β state.speakerNames[speakerId] deleted
β Tag shows "Speaker 1" (default)
Edge Cases
Edge Case 1: Clear Non-Existent Name
Speaker has default name "Speaker 1" (not in state)
User clicks tag, clears (empty input), presses Enter
Check: state.speakerNames[0] exists?
β NO: Nothing to delete
β Result: No error, shows "Speaker 1" (unchanged)
Edge Case 2: Clear During Transcription
Live transcription in progress
User clears speaker name
Result:
- Name cleared from state β
- Re-render triggered β
- New utterances show default name β
- Incremental rendering preserved β
Edge Case 3: Clear All Names
User clears all speaker names
state.speakerNames = {}
Auto-detection:
- All speakers available for detection β
- Can fill all empty names β
Edge Case 4: Whitespace-Only Input
User enters " " (spaces only)
Validation: " ".trim() === ""
β Treated as empty input
β Name cleared from state β
Testing Scenarios
β Test 1: Clear User-Edited Name
- Edit "Speaker 1" β "John"
- Verify: Tag shows "John"
- Edit "John" β "" (empty)
- Verify: Tag shows "Speaker 1"
- Verify:
state.speakerNames[0]is undefined
β Test 2: Auto-Detection After Clear
- Edit "Speaker 1" β "Wrong"
- Click "Detect Speaker Names"
- Verify: "Wrong" preserved (not overwritten)
- Clear "Wrong" β ""
- Click "Detect Speaker Names" again
- Verify: Auto-detected name appears
β Test 3: Cancel Does Not Clear
- Tag shows "John"
- Click tag, delete text
- Press Escape
- Verify: Tag shows "John" (restored)
- Verify:
state.speakerNames[0]unchanged
β Test 4: Empty Edit Triggers Re-Render
- Tag shows "John"
- Clear name β ""
- Verify: All tags for speaker 0 show "Speaker 1"
- Verify: Timeline segments updated
- Verify: Stats panel updated
β Test 5: Clear and Re-Edit
- Edit "Speaker 1" β "First"
- Clear "First" β ""
- Edit "Speaker 1" β "Second"
- Verify: All tags show "Second"
- Verify: Protected from auto-detection
β Test 6: Whitespace Handling
- Edit "Speaker 1" β " " (spaces)
- Press Enter
- Verify: Treated as empty, name cleared
- Verify: Tag shows "Speaker 1"
User Experience Improvements
Visual Feedback
Consider adding visual hints:
/* Indicate clearable/editable state */
.speaker-tag.editable-speaker {
cursor: text;
border-style: dashed; /* Hint: editable */
}
.speaker-tag.editable-speaker:hover::after {
content: " β"; /* Pencil icon */
opacity: 0.5;
}
Tooltip Enhancement
// In createUtteranceElement()
if (speakerInfo?.confidence === 'user') {
speakerTag.title = 'User-edited name (click to edit or clear)';
} else if (speakerInfo?.confidence === 'high') {
speakerTag.title = 'Auto-detected name (click to override)';
} else {
speakerTag.title = 'Click to edit speaker name';
}
Clear Button (Optional)
Add explicit "Clear" button in edit mode:
<input class="speaker-edit-input" value="John" />
<button class="clear-speaker-btn" title="Clear and allow auto-detection">Γ</button>
Implementation Checklist
- Analyze current merge logic
- Design clear/reset mechanism
- Document three speaker name states
- Implement empty input handling in
startSpeakerEdit() - Test manual clear functionality
- Test auto-detection after clear
- Test cancel vs clear behavior
- Verify timeline and stats panel sync
- Update documentation
- Commit changes
Files to Modify
/home/luigi/VoxSum/frontend/app.js
- Function:
startSpeakerEdit()(lines ~665-690) - Change: Add
elsebranch for empty input to delete from state - Impact: ~10 lines modified
No Other Files Required
- Merge logic already correct (no changes needed)
- UI rendering already supports undefined names (shows default)
Performance Considerations
- Delete operation: O(1) - fast
- Re-render trigger: Same as edit (~10-50ms)
- Memory: Reduces state size (removes cleared entries)
Backward Compatibility
- β Existing user-edited names preserved
- β Auto-detection logic unchanged
- β Default name display unchanged
- β No breaking changes to API
Conclusion
The enhancement allows users to intentionally clear speaker names to enable auto-detection, providing a clear "reset" path while maintaining protection for user edits. The implementation is simple (one else branch), robust, and maintains all existing functionality.