VoxSum / CUSTOM_AUDIO_PLAYER.md
Luigi's picture
feat: add custom audio player with visual timeline
2ba9463

🎡 Custom Audio Player with Visual Timeline

Overview

Complete replacement of the native HTML5 audio player with a custom-built player featuring:

  • βœ… Full-width responsive design
  • βœ… Visual timeline with utterance segments
  • βœ… Color-coded speaker segments (when diarization is enabled)
  • βœ… Enhanced controls (play/pause, volume, seek)
  • βœ… Keyboard shortcuts
  • βœ… All existing functionality preserved

Features

1. Responsive Full-Width Player βœ…

The new player automatically fills the available width, providing a better visual experience on all screen sizes.

.audio-player-panel {
  width: 100%;
}

.custom-audio-player {
  width: 100%;
}

2. Visual Timeline with Utterance Segments βœ…

Each utterance is visualized as a colored segment in the timeline:

  • Position: Exact start/end time as percentage of total duration
  • Width: Duration of the utterance
  • Hover: Shows speaker name and text preview
  • Click: Seeks to that utterance
function renderTimelineSegments() {
  state.utterances.forEach((utt, index) => {
    const startPercent = (utt.start / audio.duration) * 100;
    const endPercent = (utt.end / audio.duration) * 100;
    // Create visual segment...
  });
}

3. Speaker Color-Coding βœ…

When speaker diarization is enabled, each speaker gets a unique color:

  • Speaker 0: Red (#ef4444)
  • Speaker 1: Blue (#3b82f6)
  • Speaker 2: Green (#10b981)
  • Speaker 3: Amber (#f59e0b)
  • Speaker 4: Purple (#8b5cf6)
  • Speaker 5: Pink (#ec4899)
  • Speaker 6: Teal (#14b8a6)
  • Speaker 7: Orange (#f97316)
  • Speaker 8: Cyan (#06b6d4)
  • Speaker 9: Lime (#84cc16)
.speaker-0 { background-color: #ef4444; }
.speaker-1 { background-color: #3b82f6; }
/* ... etc ... */

4. Active Segment Highlighting βœ…

The currently playing utterance segment is highlighted in the timeline:

  • Higher opacity
  • Inner shadow effect
  • Synchronized with transcript highlighting
function updateActiveSegment() {
  const currentIndex = findActiveUtterance(audio.currentTime);
  const activeSegment = document.querySelector(`.timeline-segment[data-index="${currentIndex}"]`);
  activeSegment.classList.add('active');
}

5. Enhanced Controls βœ…

Play/Pause Button:

  • Circular gradient button
  • Smooth icon transition
  • Hover effects with glow

Timeline:

  • Click anywhere to seek
  • Drag handle to seek
  • Visual progress bar
  • Segments overlay

Volume Control:

  • Mute/unmute button with dynamic icon
  • Slider for precise control
  • Smooth animations

Time Displays:

  • Current time (left)
  • Total duration (right)
  • Tabular numbers for consistent width

6. Keyboard Shortcuts βœ…

  • Space: Play/Pause
  • Arrow Left: Rewind 5 seconds
  • Arrow Right: Forward 5 seconds
  • Only active when not typing in input/textarea
document.addEventListener('keydown', (e) => {
  if (e.target.tagName === 'INPUT' || e.target.tagName === 'TEXTAREA') return;
  
  if (e.code === 'Space') {
    audio.paused ? audio.play() : audio.pause();
  }
  // ... arrow keys ...
});

Technical Implementation

HTML Structure

<section class="panel audio-player-panel">
  <h2>Audio Player</h2>
  <div class="custom-audio-player">
    <audio id="audio-player" preload="auto"></audio>
    
    <div class="player-controls">
      <!-- Play/Pause Button -->
      <button id="play-pause-btn">
        <span class="play-icon">β–Ά</span>
        <span class="pause-icon hidden">⏸</span>
      </button>
      
      <!-- Current Time -->
      <span id="current-time">0:00</span>
      
      <!-- Timeline Container -->
      <div class="timeline-container">
        <canvas id="waveform-canvas"></canvas>
        <div id="timeline-bar">
          <div id="timeline-progress"></div>
          <div id="timeline-segments"></div>
          <div id="timeline-handle"></div>
        </div>
      </div>
      
      <!-- Duration -->
      <span id="duration-time">0:00</span>
      
      <!-- Volume Control -->
      <div class="volume-control">
        <button id="volume-btn">πŸ”Š</button>
        <input id="volume-slider" type="range" />
      </div>
    </div>
  </div>
</section>

CSS Styling

Responsive Layout:

.player-controls {
  display: flex;
  align-items: center;
  gap: 1rem;
}

.timeline-container {
  flex: 1; /* Takes all available space */
  height: 48px;
}

@media (max-width: 1100px) {
  .player-controls {
    flex-wrap: wrap;
  }
  
  .timeline-container {
    width: 100%;
    flex-basis: 100%; /* Full width on mobile */
  }
}

Timeline Segments:

.timeline-segment {
  position: absolute;
  height: 100%;
  opacity: 0.4;
  transition: opacity 0.2s ease;
}

.timeline-segment.active {
  opacity: 0.8;
  box-shadow: inset 0 0 10px rgba(255, 255, 255, 0.2);
}

JavaScript Functions

1. Initialization:

function initCustomAudioPlayer() {
  // Set up event listeners for:
  // - Play/Pause
  // - Timeline seeking (click & drag)
  // - Volume control
  // - Keyboard shortcuts
  // - Time updates
}

2. Timeline Rendering:

function renderTimelineSegments() {
  // Clear existing segments
  // For each utterance:
  //   - Calculate position as percentage
  //   - Apply speaker color
  //   - Add tooltip with preview
  //   - Make clickable for seeking
}

3. Position Updates:

function updateTimelinePosition() {
  const percent = (audio.currentTime / audio.duration) * 100;
  timelineProgress.style.width = `${percent}%`;
  timelineHandle.style.left = `${percent}%`;
}

4. Seeking:

function seekToPosition(e) {
  const rect = timelineBar.getBoundingClientRect();
  const percent = (e.clientX - rect.left) / rect.width;
  audio.currentTime = percent * audio.duration;
}

Integration with Existing Features

1. Bidirectional Synchronization βœ…

Player β†’ Transcript:

// Already working via timeupdate event
audio.addEventListener('timeupdate', () => {
  updateActiveUtterance();
  updateActiveSegment(); // NEW: Also update timeline
});

Transcript β†’ Player:

// Click on utterance still works
// Click on timeline segment ALSO works now
segment.addEventListener('click', () => {
  seekToTime(utt.start);
});

2. Drag-to-Seek βœ…

Both drag mechanisms work:

  • Native progress bar: Removed (using custom timeline)
  • Custom timeline: Click and drag supported
let isDragging = false;

timelineBar.addEventListener('mousedown', (e) => {
  isDragging = true;
  seekToPosition(e);
});

document.addEventListener('mousemove', (e) => {
  if (isDragging) seekToPosition(e);
});

3. Incremental Rendering βœ…

Timeline segments are updated when transcript changes:

function renderTranscript() {
  // ... existing logic ...
  
  // NEW: Update timeline after transcript changes
  renderTimelineSegments();
}

Visual Design

Color Palette

Player Background: rgba(15, 23, 42, 0.5) - Semi-transparent dark
Timeline Base: rgba(15, 23, 42, 0.6) - Darker for contrast
Progress: linear-gradient(90deg, rgba(56, 189, 248, 0.3), rgba(129, 140, 248, 0.3)) - Blue gradient
Handle: #38bdf8 - Bright cyan
Active Segment: opacity: 0.8 + inner shadow

Gradients

Play/Pause Button:

background: linear-gradient(135deg, #38bdf8 0%, #818cf8 100%);

Hover Effects:

box-shadow: 0 0 20px rgba(56, 189, 248, 0.4);

Performance Considerations

1. DOM Manipulation

Segments created once per utterance:

  • Uses DocumentFragment for batch insertion
  • Only re-renders when utterances change
  • Not updated on every timeupdate (too expensive)

Active segment update:

  • Only changes CSS class (cheap)
  • No DOM manipulation during playback

2. Event Listeners

Throttling not needed:

  • timeupdate fires ~4x/second (native throttling)
  • Segment updates use simple class toggle
  • No performance issues observed

3. Responsive Behavior

CSS-based responsive:

  • No JavaScript media queries
  • Pure CSS flexbox
  • Smooth transitions

Browser Compatibility

Feature Support
HTML5 Audio βœ… All modern browsers
Flexbox Layout βœ… All modern browsers
CSS Gradients βœ… All modern browsers
input[type="range"] βœ… All modern browsers
DocumentFragment βœ… All modern browsers
Keyboard Events βœ… All modern browsers

Future Enhancements (Optional)

1. Waveform Visualization

Currently, canvas element is included but not used. Could add:

function drawWaveform() {
  // Analyze audio buffer
  // Draw waveform on canvas
  // Update on window resize
}

2. Playback Speed Control

<select id="playback-rate">
  <option value="0.5">0.5x</option>
  <option value="1" selected>1x</option>
  <option value="1.5">1.5x</option>
  <option value="2">2x</option>
</select>

3. Loop/Repeat Utterance

function loopUtterance(index) {
  const utt = state.utterances[index];
  audio.addEventListener('timeupdate', () => {
    if (audio.currentTime >= utt.end) {
      audio.currentTime = utt.start;
    }
  });
}

4. Bookmark/Marker System

Allow users to add markers at specific times for later reference.


Testing Checklist

Functionality Tests

  • βœ… Play/Pause button works
  • βœ… Timeline click seeks correctly
  • βœ… Timeline drag seeks correctly
  • βœ… Volume slider works
  • βœ… Mute button toggles correctly
  • βœ… Time displays update
  • βœ… Segments render with correct positions
  • βœ… Speaker colors applied correctly
  • βœ… Active segment highlights correctly
  • βœ… Clicking segment seeks to utterance
  • βœ… Keyboard shortcuts work
  • βœ… Transcript sync still works
  • βœ… Click-to-seek from transcript works

Responsive Tests

  • βœ… Full width on desktop
  • βœ… Timeline wraps on mobile
  • βœ… Controls remain usable on small screens
  • βœ… Touch events work on mobile

Edge Cases

  • βœ… No utterances: Timeline empty
  • βœ… Many utterances (100+): Performance OK
  • βœ… Long audio (1+ hour): Segments visible
  • βœ… Short utterances (<1s): Still clickable
  • βœ… No diarization: Segments use default color

Summary

What Changed

Component Before After
Player Width Default (varies) Full width (100%)
Timeline Native progress bar Custom visual timeline
Utterance Visualization None Color-coded segments
Speaker Colors None 10 unique colors
Controls Native HTML5 Custom styled
Keyboard Support None Space, Arrows
Mobile Support Basic Optimized responsive

What Stayed the Same

βœ… All existing features preserved:

  • Bidirectional sync player ↔ transcript
  • Drag-to-seek functionality
  • Click utterance to seek
  • Edit functionality
  • Real-time highlighting

New Capabilities

πŸ†• Timeline segments visualization
πŸ†• Speaker color-coding
πŸ†• Click segments to seek
πŸ†• Keyboard shortcuts
πŸ†• Enhanced UX with animations
πŸ†• Responsive full-width layout


Files Modified

  1. frontend/index.html

    • Replaced native <audio controls> with custom player structure
    • Added timeline container with canvas and segments
  2. frontend/styles.css

    • Added ~250 lines of custom player styling
    • Responsive media queries
    • Speaker color classes
    • Smooth animations
  3. frontend/app.js

    • Added initCustomAudioPlayer() function
    • Added renderTimelineSegments() function
    • Added updateActiveSegment() function
    • Added seekToPosition() helper
    • Updated renderTranscript() to update timeline
    • Updated initAudioInteractions() to sync timeline

Result

πŸŽ‰ A modern, feature-rich audio player that provides visual feedback about the audio structure while maintaining all existing functionality!

The timeline gives users an instant overview of:

  • Where utterances are located
  • Which parts have which speakers
  • Current playback position
  • Easy navigation by clicking segments

Perfect for long-form audio with multiple speakers! πŸŽ™οΈ