Consolidate tests under tests/, add LLM default tests with opt-out flag, model selection, README update
Browse files- README.md +80 -0
- requirements.txt +2 -1
- src/diarization.py +122 -26
- tests/conftest.py +62 -0
- tests/test_diarization_minimal.py +136 -0
- tests/test_multilingual.py +74 -0
- tests/test_multilingual_quick.py +36 -0
- tests/test_summary_language.py +33 -0
README.md
CHANGED
|
@@ -95,6 +95,86 @@ voxsum-studio/
|
|
| 95 |
- Large audio files may take longer to process, especially in a resource-constrained environment like Hugging Face Spaces.
|
| 96 |
- YouTube audio fetching requires a valid URL and may be subject to rate limits or availability.
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
## Contributing
|
| 99 |
Contributions are welcome! To contribute:
|
| 100 |
1. Fork the repository on Hugging Face.
|
|
|
|
| 95 |
- Large audio files may take longer to process, especially in a resource-constrained environment like Hugging Face Spaces.
|
| 96 |
- YouTube audio fetching requires a valid URL and may be subject to rate limits or availability.
|
| 97 |
|
| 98 |
+
## Tests
|
| 99 |
+
|
| 100 |
+
### Overview
|
| 101 |
+
LLM tests are now part of the default test run because multilingual summarization and title generation are core to VoxSum’s value.
|
| 102 |
+
|
| 103 |
+
Test categories:
|
| 104 |
+
1. LLM-dependent tests (default ON): multilingual summarization, title generation, language consistency.
|
| 105 |
+
2. Lightweight diarization tests: fast heuristics & structural checks.
|
| 106 |
+
|
| 107 |
+
If you need a fast pass without loading models (e.g. in a tiny CI runner), you can explicitly skip LLM tests (see below).
|
| 108 |
+
|
| 109 |
+
### Running all tests (default, includes LLM)
|
| 110 |
+
Install dependencies then run:
|
| 111 |
+
|
| 112 |
+
```
|
| 113 |
+
pip install -r requirements.txt
|
| 114 |
+
pytest -q
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
### Skipping LLM tests (opt-out)
|
| 118 |
+
If you only want the lightweight diarization tests:
|
| 119 |
+
```
|
| 120 |
+
export VOXSUM_SKIP_LLM_TESTS=1
|
| 121 |
+
pytest -q
|
| 122 |
+
```
|
| 123 |
+
This will module-skip:
|
| 124 |
+
- `test_multilingual.py`
|
| 125 |
+
- `test_multilingual_quick.py`
|
| 126 |
+
- `test_summary_language.py`
|
| 127 |
+
|
| 128 |
+
These tests exercise:
|
| 129 |
+
- Multilingual summarization pipeline (`summarize_transcript`)
|
| 130 |
+
- Title generation (`generate_title`)
|
| 131 |
+
- Language consistency heuristics
|
| 132 |
+
|
| 133 |
+
### Mocking strategy (opt-out mode)
|
| 134 |
+
`tests/conftest.py` activates a lightweight mock of the LLM interface only when `VOXSUM_SKIP_LLM_TESTS=1`:
|
| 135 |
+
- Replaces `get_llm()` with a dummy object.
|
| 136 |
+
- Avoids native model loading cost.
|
| 137 |
+
- Provides deterministic minimal outputs for structural assertions.
|
| 138 |
+
|
| 139 |
+
### Minimal diarization sanity test
|
| 140 |
+
File: `tests/test_diarization_minimal.py`
|
| 141 |
+
|
| 142 |
+
It validates four scenarios:
|
| 143 |
+
- Single segment
|
| 144 |
+
- Two very similar segments (should unify speaker identity)
|
| 145 |
+
- Two dissimilar segments (can diverge; heuristic tolerant)
|
| 146 |
+
- Three segments (granularity preservation path)
|
| 147 |
+
|
| 148 |
+
The test harness:
|
| 149 |
+
- Uses a mock embedding extractor (no external model downloads).
|
| 150 |
+
- Exercises the small-`n` heuristic path (<3 embeddings) and the adaptive clustering interface.
|
| 151 |
+
|
| 152 |
+
Run directly if desired:
|
| 153 |
+
```
|
| 154 |
+
python3 tests/test_diarization_minimal.py
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### Troubleshooting
|
| 158 |
+
| Symptom | Likely Cause | Fix |
|
| 159 |
+
|---------|--------------|-----|
|
| 160 |
+
| Segmentation fault during tests | Native model resource issue | Temporarily `export VOXSUM_SKIP_LLM_TESTS=1` to isolate; verify `llama_cpp` install / model size |
|
| 161 |
+
| LLM tests unexpectedly skipped | You left skip var set | `unset VOXSUM_SKIP_LLM_TESTS`; re-run tests |
|
| 162 |
+
| Slow startup | Large GGUF model download/load | Choose a smaller model in `available_gguf_llms` |
|
| 163 |
+
| Mock not applied (you wanted skip) | Forgot to set skip var | `export VOXSUM_SKIP_LLM_TESTS=1` |
|
| 164 |
+
|
| 165 |
+
### Adding new tests
|
| 166 |
+
When adding tests that touch summarization or title generation:
|
| 167 |
+
1. Assume they run by default; only guard them with the skip variable if they’re extremely slow or redundant.
|
| 168 |
+
2. Keep logic deterministic—avoid external network calls beyond local model loading.
|
| 169 |
+
3. For structure-only assertions, instruct contributors they can run with `VOXSUM_SKIP_LLM_TESTS=1` for speed.
|
| 170 |
+
|
| 171 |
+
### CI Recommendation
|
| 172 |
+
Two useful CI lanes:
|
| 173 |
+
1. Full (default): `pytest -q` (includes LLM tests)
|
| 174 |
+
2. Fast lane (optional): `VOXSUM_SKIP_LLM_TESTS=1 pytest -q` for quick structural feedback.
|
| 175 |
+
|
| 176 |
+
Run the fast lane on every commit if startup time is critical; schedule the full lane on PR and nightly builds.
|
| 177 |
+
|
| 178 |
## Contributing
|
| 179 |
Contributions are welcome! To contribute:
|
| 180 |
1. Fork the repository on Hugging Face.
|
requirements.txt
CHANGED
|
@@ -19,4 +19,5 @@ uvicorn[standard]
|
|
| 19 |
python-multipart
|
| 20 |
jinja2
|
| 21 |
aiofiles
|
| 22 |
-
langchain
|
|
|
|
|
|
| 19 |
python-multipart
|
| 20 |
jinja2
|
| 21 |
aiofiles
|
| 22 |
+
langchain
|
| 23 |
+
pytest
|
src/diarization.py
CHANGED
|
@@ -14,14 +14,32 @@ OPTIMIZED MODEL: 3dspeaker_campplus_zh_en_advanced
|
|
| 14 |
|
| 15 |
import os
|
| 16 |
import numpy as np
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
from pathlib import Path
|
| 19 |
-
from typing import List, Tuple, Optional, Callable, Dict, Any
|
| 20 |
import logging
|
| 21 |
from .utils import get_writable_model_dir, num_vcpus
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
import shutil
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
# Import the improved diarization pipeline (robust: search repo tree)
|
| 27 |
try:
|
|
@@ -165,7 +183,7 @@ def perform_speaker_diarization_on_utterances(
|
|
| 165 |
embedding_extractor: object,
|
| 166 |
config_dict: dict,
|
| 167 |
progress_callback: Optional[Callable] = None
|
| 168 |
-
) -> List[Tuple[float, float, int]]:
|
| 169 |
"""
|
| 170 |
Perform speaker diarization using existing ASR utterance segments
|
| 171 |
This avoids double segmentation by reusing Silero VAD results
|
|
@@ -234,9 +252,15 @@ def perform_speaker_diarization_on_utterances(
|
|
| 234 |
|
| 235 |
try:
|
| 236 |
# Extract embedding using Sherpa-ONNX with proper stream API
|
|
|
|
|
|
|
| 237 |
stream = embedding_extractor.create_stream()
|
| 238 |
-
stream
|
| 239 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
embedding = embedding_extractor.compute(stream)
|
| 241 |
|
| 242 |
if embedding is not None and len(embedding) > 0:
|
|
@@ -261,9 +285,42 @@ def perform_speaker_diarization_on_utterances(
|
|
| 261 |
# Convert embeddings to numpy array
|
| 262 |
embeddings_array = np.array(embeddings)
|
| 263 |
print(f"✅ DEBUG: Embeddings array shape: {embeddings_array.shape}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 264 |
|
| 265 |
# Use enhanced diarization if available
|
| 266 |
-
if ENHANCED_DIARIZATION_AVAILABLE:
|
| 267 |
print("🚀 Using enhanced diarization with adaptive clustering...")
|
| 268 |
logger.info("🚀 Using enhanced adaptive clustering...")
|
| 269 |
|
|
@@ -314,15 +371,28 @@ def perform_speaker_diarization_on_utterances(
|
|
| 314 |
diarization_result = []
|
| 315 |
for utt in enhanced_utterances:
|
| 316 |
diarization_result.append((utt['start'], utt['end'], utt['speaker']))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 317 |
|
| 318 |
if progress_callback:
|
| 319 |
progress_callback(1.0) # 100% complete
|
| 320 |
yield 1.0
|
| 321 |
-
|
| 322 |
print(f"✅ DEBUG: Enhanced result - {n_speakers} speakers, {len(diarization_result)} segments")
|
| 323 |
logger.info(f"🎭 Enhanced clustering completed! Detected {n_speakers} speakers with {confidence} confidence")
|
| 324 |
-
|
| 325 |
-
|
|
|
|
| 326 |
|
| 327 |
except Exception as e:
|
| 328 |
logger.error(f"❌ Enhanced diarization failed: {e}")
|
|
@@ -333,17 +403,20 @@ def perform_speaker_diarization_on_utterances(
|
|
| 333 |
logger.warning("⚠️ Using fallback clustering")
|
| 334 |
print("⚠️ Using fallback clustering")
|
| 335 |
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
|
|
|
|
|
|
|
|
|
| 339 |
try:
|
| 340 |
while True:
|
| 341 |
p = next(gen)
|
| 342 |
yield p
|
| 343 |
except StopIteration as e:
|
| 344 |
diarization_result = e.value
|
| 345 |
-
|
| 346 |
-
return
|
| 347 |
|
| 348 |
except Exception as e:
|
| 349 |
error_msg = f"❌ Speaker diarization failed: {e}"
|
|
@@ -537,17 +610,38 @@ def faiss_clustering(embeddings: np.ndarray,
|
|
| 537 |
n_samples, dim = embeddings.shape
|
| 538 |
n_clusters = config_dict['num_speakers']
|
| 539 |
if n_clusters == -1:
|
| 540 |
-
#
|
| 541 |
-
|
| 542 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 543 |
for k in range(2, max_k + 1):
|
| 544 |
-
|
| 545 |
-
|
| 546 |
-
|
| 547 |
-
|
| 548 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 549 |
if sil > best_score:
|
| 550 |
-
best_score, best_k, best_labels = sil, k,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 551 |
labels = best_labels
|
| 552 |
else:
|
| 553 |
kmeans = faiss.Kmeans(dim, min(n_clusters, n_samples), niter=20, verbose=False, seed=42)
|
|
@@ -559,10 +653,12 @@ def faiss_clustering(embeddings: np.ndarray,
|
|
| 559 |
progress_callback(1.0)
|
| 560 |
yield 1.0
|
| 561 |
|
| 562 |
-
num_speakers = len(set(labels))
|
| 563 |
print(f"✅ DEBUG: FAISS clustering — {num_speakers} speakers, {len(utterances)} segments")
|
| 564 |
logger.info(f"🎭 FAISS clustering completed! Detected {num_speakers} speakers")
|
| 565 |
|
|
|
|
|
|
|
| 566 |
return [(start, end, int(lbl)) for (start, end, _), lbl in zip(utterances, labels)]
|
| 567 |
|
| 568 |
|
|
|
|
| 14 |
|
| 15 |
import os
|
| 16 |
import numpy as np
|
| 17 |
+
try:
|
| 18 |
+
import sherpa_onnx # type: ignore
|
| 19 |
+
except Exception: # pragma: no cover
|
| 20 |
+
class _SherpaStub: # minimal stub to allow tests without the dependency
|
| 21 |
+
class SpeakerEmbeddingExtractorConfig: # noqa: D401
|
| 22 |
+
def __init__(self, *args, **kwargs):
|
| 23 |
+
pass
|
| 24 |
+
class SpeakerEmbeddingExtractor:
|
| 25 |
+
def __init__(self, *args, **kwargs):
|
| 26 |
+
raise RuntimeError("sherpa_onnx not installed; real embedding extraction unavailable")
|
| 27 |
+
sherpa_onnx = _SherpaStub() # type: ignore
|
| 28 |
from pathlib import Path
|
| 29 |
+
from typing import List, Tuple, Optional, Callable, Dict, Any, Generator
|
| 30 |
import logging
|
| 31 |
from .utils import get_writable_model_dir, num_vcpus
|
| 32 |
+
try: # Optional dependency
|
| 33 |
+
from huggingface_hub import hf_hub_download # type: ignore
|
| 34 |
+
except Exception: # pragma: no cover
|
| 35 |
+
def hf_hub_download(*args, **kwargs): # minimal stub
|
| 36 |
+
raise RuntimeError("huggingface_hub not installed; model download unavailable")
|
| 37 |
import shutil
|
| 38 |
+
try: # Optional dependency
|
| 39 |
+
from sklearn.metrics import silhouette_score # type: ignore
|
| 40 |
+
except Exception: # pragma: no cover
|
| 41 |
+
def silhouette_score(*args, **kwargs):
|
| 42 |
+
return -1.0
|
| 43 |
|
| 44 |
# Import the improved diarization pipeline (robust: search repo tree)
|
| 45 |
try:
|
|
|
|
| 183 |
embedding_extractor: object,
|
| 184 |
config_dict: dict,
|
| 185 |
progress_callback: Optional[Callable] = None
|
| 186 |
+
) -> Generator[float | List[Tuple[float, float, int]], None, List[Tuple[float, float, int]]]:
|
| 187 |
"""
|
| 188 |
Perform speaker diarization using existing ASR utterance segments
|
| 189 |
This avoids double segmentation by reusing Silero VAD results
|
|
|
|
| 252 |
|
| 253 |
try:
|
| 254 |
# Extract embedding using Sherpa-ONNX with proper stream API
|
| 255 |
+
if not hasattr(embedding_extractor, "create_stream"):
|
| 256 |
+
raise RuntimeError("Embedding extractor missing create_stream(); sherpa_onnx not available?")
|
| 257 |
stream = embedding_extractor.create_stream()
|
| 258 |
+
if hasattr(stream, "accept_waveform"):
|
| 259 |
+
stream.accept_waveform(sample_rate, segment)
|
| 260 |
+
if hasattr(stream, "input_finished"):
|
| 261 |
+
stream.input_finished()
|
| 262 |
+
if not hasattr(embedding_extractor, "compute"):
|
| 263 |
+
raise RuntimeError("Embedding extractor missing compute(); sherpa_onnx not available?")
|
| 264 |
embedding = embedding_extractor.compute(stream)
|
| 265 |
|
| 266 |
if embedding is not None and len(embedding) > 0:
|
|
|
|
| 285 |
# Convert embeddings to numpy array
|
| 286 |
embeddings_array = np.array(embeddings)
|
| 287 |
print(f"✅ DEBUG: Embeddings array shape: {embeddings_array.shape}")
|
| 288 |
+
n_embeddings = embeddings_array.shape[0]
|
| 289 |
+
|
| 290 |
+
# Cas très faible nombre de segments: éviter tout clustering complexe
|
| 291 |
+
if n_embeddings < 3:
|
| 292 |
+
print("⚠️ DEBUG: Moins de 3 segments – utilisation d'une heuristique simple sans clustering")
|
| 293 |
+
assignments: List[Tuple[float, float, int]] = []
|
| 294 |
+
if n_embeddings == 1:
|
| 295 |
+
(s, e, _t) = valid_utterances[0]
|
| 296 |
+
assignments.append((s, e, 0))
|
| 297 |
+
elif n_embeddings == 2:
|
| 298 |
+
try:
|
| 299 |
+
from sklearn.metrics.pairwise import cosine_similarity # type: ignore
|
| 300 |
+
sim = float(cosine_similarity(embeddings_array[0:1], embeddings_array[1:2])[0, 0])
|
| 301 |
+
except Exception:
|
| 302 |
+
a = embeddings_array[0].astype(float)
|
| 303 |
+
b = embeddings_array[1].astype(float)
|
| 304 |
+
denom = (np.linalg.norm(a) * np.linalg.norm(b)) or 1e-9
|
| 305 |
+
sim = float(np.dot(a, b) / denom)
|
| 306 |
+
(s1, e1, _t1) = valid_utterances[0]
|
| 307 |
+
(s2, e2, _t2) = valid_utterances[1]
|
| 308 |
+
if sim >= 0.80:
|
| 309 |
+
assignments.append((s1, e1, 0))
|
| 310 |
+
assignments.append((s2, e2, 0))
|
| 311 |
+
print(f"🟢 DEBUG: Deux segments fusionnés en un seul speaker (similarité={sim:.3f})")
|
| 312 |
+
else:
|
| 313 |
+
assignments.append((s1, e1, 0))
|
| 314 |
+
assignments.append((s2, e2, 1))
|
| 315 |
+
print(f"🟦 DEBUG: Deux speakers distincts (similarité={sim:.3f})")
|
| 316 |
+
if progress_callback:
|
| 317 |
+
progress_callback(1.0)
|
| 318 |
+
yield 1.0
|
| 319 |
+
yield assignments
|
| 320 |
+
return
|
| 321 |
|
| 322 |
# Use enhanced diarization if available
|
| 323 |
+
if ENHANCED_DIARIZATION_AVAILABLE and n_embeddings >= 3:
|
| 324 |
print("🚀 Using enhanced diarization with adaptive clustering...")
|
| 325 |
logger.info("🚀 Using enhanced adaptive clustering...")
|
| 326 |
|
|
|
|
| 371 |
diarization_result = []
|
| 372 |
for utt in enhanced_utterances:
|
| 373 |
diarization_result.append((utt['start'], utt['end'], utt['speaker']))
|
| 374 |
+
|
| 375 |
+
# Si l'enhanced pipeline a tout fusionné en un seul segment alors qu'on avait peu de segments
|
| 376 |
+
# on restaure la granularité originale pour ne pas perdre l'alignement temporel côté UI/tests.
|
| 377 |
+
if (
|
| 378 |
+
len(diarization_result) == 1
|
| 379 |
+
and len(valid_utterances) == n_embeddings
|
| 380 |
+
and n_embeddings <= 4
|
| 381 |
+
):
|
| 382 |
+
single_speaker = diarization_result[0][2]
|
| 383 |
+
diarization_result = [
|
| 384 |
+
(s, e, single_speaker) for (s, e, _t) in valid_utterances
|
| 385 |
+
]
|
| 386 |
|
| 387 |
if progress_callback:
|
| 388 |
progress_callback(1.0) # 100% complete
|
| 389 |
yield 1.0
|
| 390 |
+
|
| 391 |
print(f"✅ DEBUG: Enhanced result - {n_speakers} speakers, {len(diarization_result)} segments")
|
| 392 |
logger.info(f"🎭 Enhanced clustering completed! Detected {n_speakers} speakers with {confidence} confidence")
|
| 393 |
+
|
| 394 |
+
yield diarization_result
|
| 395 |
+
return
|
| 396 |
|
| 397 |
except Exception as e:
|
| 398 |
logger.error(f"❌ Enhanced diarization failed: {e}")
|
|
|
|
| 403 |
logger.warning("⚠️ Using fallback clustering")
|
| 404 |
print("⚠️ Using fallback clustering")
|
| 405 |
|
| 406 |
+
gen = faiss_clustering(
|
| 407 |
+
embeddings_array,
|
| 408 |
+
valid_utterances,
|
| 409 |
+
config_dict,
|
| 410 |
+
progress_callback,
|
| 411 |
+
)
|
| 412 |
try:
|
| 413 |
while True:
|
| 414 |
p = next(gen)
|
| 415 |
yield p
|
| 416 |
except StopIteration as e:
|
| 417 |
diarization_result = e.value
|
| 418 |
+
yield diarization_result
|
| 419 |
+
return
|
| 420 |
|
| 421 |
except Exception as e:
|
| 422 |
error_msg = f"❌ Speaker diarization failed: {e}"
|
|
|
|
| 610 |
n_samples, dim = embeddings.shape
|
| 611 |
n_clusters = config_dict['num_speakers']
|
| 612 |
if n_clusters == -1:
|
| 613 |
+
# Si très peu d'échantillons, attribuer tout au locuteur 0
|
| 614 |
+
if n_samples < 3:
|
| 615 |
+
if progress_callback:
|
| 616 |
+
progress_callback(1.0)
|
| 617 |
+
yield 1.0
|
| 618 |
+
return [(s, e, 0) for (s, e, _t) in utterances]
|
| 619 |
+
max_k = min(10, max(2, n_samples // 2))
|
| 620 |
+
best_score, best_k, best_labels = -1.0, 2, None
|
| 621 |
+
emb32 = embeddings.astype(np.float32)
|
| 622 |
for k in range(2, max_k + 1):
|
| 623 |
+
if k >= n_samples: # éviter k == n_samples (silhouette invalide)
|
| 624 |
+
break
|
| 625 |
+
kmeans = faiss.Kmeans(dim, k, niter=25, verbose=False, seed=42)
|
| 626 |
+
kmeans.train(emb32)
|
| 627 |
+
_, lbls = kmeans.index.search(emb32, 1)
|
| 628 |
+
lbls = lbls.ravel()
|
| 629 |
+
uniq = set(lbls)
|
| 630 |
+
if 1 < len(uniq) < n_samples:
|
| 631 |
+
try:
|
| 632 |
+
sil = silhouette_score(embeddings, lbls)
|
| 633 |
+
except Exception:
|
| 634 |
+
sil = -1.0
|
| 635 |
+
else:
|
| 636 |
+
sil = -1.0
|
| 637 |
if sil > best_score:
|
| 638 |
+
best_score, best_k, best_labels = sil, k, lbls
|
| 639 |
+
if best_labels is None:
|
| 640 |
+
# Fallback trivial: tout un seul locuteur
|
| 641 |
+
if progress_callback:
|
| 642 |
+
progress_callback(1.0)
|
| 643 |
+
yield 1.0
|
| 644 |
+
return [(s, e, 0) for (s, e, _t) in utterances]
|
| 645 |
labels = best_labels
|
| 646 |
else:
|
| 647 |
kmeans = faiss.Kmeans(dim, min(n_clusters, n_samples), niter=20, verbose=False, seed=42)
|
|
|
|
| 653 |
progress_callback(1.0)
|
| 654 |
yield 1.0
|
| 655 |
|
| 656 |
+
num_speakers = len(set(labels)) if labels is not None else 1
|
| 657 |
print(f"✅ DEBUG: FAISS clustering — {num_speakers} speakers, {len(utterances)} segments")
|
| 658 |
logger.info(f"🎭 FAISS clustering completed! Detected {num_speakers} speakers")
|
| 659 |
|
| 660 |
+
if labels is None:
|
| 661 |
+
return [(s, e, 0) for (s, e, _t) in utterances]
|
| 662 |
return [(start, end, int(lbl)) for (start, end, _), lbl in zip(utterances, labels)]
|
| 663 |
|
| 664 |
|
tests/conftest.py
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Pytest configuration & lightweight LLM mocking.
|
| 2 |
+
|
| 3 |
+
By default (when VOXSUM_RUN_LLM_TESTS != '1'), we *mock* heavy LLM loading
|
| 4 |
+
from `llama_cpp` to avoid native model initialization (which caused segfaults
|
| 5 |
+
in CI / constrained environments).
|
| 6 |
+
|
| 7 |
+
Set VOXSUM_RUN_LLM_TESTS=1 to run the real LLM-dependent tests.
|
| 8 |
+
"""
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
import os
|
| 12 |
+
import types
|
| 13 |
+
import pytest
|
| 14 |
+
import sys
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
|
| 17 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 18 |
+
if str(ROOT) not in sys.path:
|
| 19 |
+
sys.path.insert(0, str(ROOT))
|
| 20 |
+
|
| 21 |
+
# Only install mocks when user explicitly wants to skip heavy LLM tests
|
| 22 |
+
if os.getenv("VOXSUM_SKIP_LLM_TESTS") == "1":
|
| 23 |
+
# Patch src.summarization.get_llm to return a dummy object with needed interface
|
| 24 |
+
import src.summarization as summarization # type: ignore
|
| 25 |
+
|
| 26 |
+
class _DummyLlama:
|
| 27 |
+
def __init__(self):
|
| 28 |
+
self._calls = []
|
| 29 |
+
def create_chat_completion(self, messages, stream=False, **kwargs): # pragma: no cover - simple mock
|
| 30 |
+
# Return a deterministic short response using last user content
|
| 31 |
+
user_content = ""
|
| 32 |
+
for m in messages[::-1]:
|
| 33 |
+
if m.get("role") == "user":
|
| 34 |
+
user_content = m.get("content", "")
|
| 35 |
+
break
|
| 36 |
+
# Provide a minimal plausible answer
|
| 37 |
+
text = "[MOCK] " + (user_content[:80].replace('\n', ' ') if user_content else "Summary")
|
| 38 |
+
return {"choices": [{"message": {"content": text}}]}
|
| 39 |
+
def tokenize(self, data: bytes): # pragma: no cover - trivial
|
| 40 |
+
return list(data[:16]) # pretend small token list
|
| 41 |
+
def detokenize(self, tokens): # pragma: no cover - trivial
|
| 42 |
+
return bytes(tokens)
|
| 43 |
+
|
| 44 |
+
def _mock_get_llm(selected_gguf_model: str): # pragma: no cover - trivial
|
| 45 |
+
return _DummyLlama()
|
| 46 |
+
|
| 47 |
+
# Install the mock only if not already swapped
|
| 48 |
+
if getattr(summarization.get_llm, "__name__", "") != "_mock_get_llm":
|
| 49 |
+
summarization.get_llm = _mock_get_llm # type: ignore
|
| 50 |
+
|
| 51 |
+
@pytest.fixture
|
| 52 |
+
def dummy_llm():
|
| 53 |
+
"""Fixture exposing a dummy LLM (even when real tests run)."""
|
| 54 |
+
if os.getenv("VOXSUM_SKIP_LLM_TESTS") != "1":
|
| 55 |
+
import src.summarization as summarization # type: ignore
|
| 56 |
+
yield summarization.get_llm(list(summarization.available_gguf_llms.keys())[0]) # type: ignore
|
| 57 |
+
else:
|
| 58 |
+
# Provide a standalone dummy consistent with the mock
|
| 59 |
+
class _Faux:
|
| 60 |
+
def create_chat_completion(self, messages, stream=False, **kwargs):
|
| 61 |
+
return {"choices": [{"message": {"content": "[MOCK FIXTURE RESPONSE]"}}]}
|
| 62 |
+
yield _Faux()
|
tests/test_diarization_minimal.py
ADDED
|
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Pytest-based minimal sanity tests for `perform_speaker_diarization_on_utterances`.
|
| 3 |
+
|
| 4 |
+
These tests avoid heavy dependencies (sherpa_onnx/faiss/sklearn) by using a mock
|
| 5 |
+
extractor and rely on the lightweight paths & heuristics implemented in
|
| 6 |
+
`src.diarization`.
|
| 7 |
+
|
| 8 |
+
Run:
|
| 9 |
+
pytest -q tests/test_diarization_minimal.py
|
| 10 |
+
|
| 11 |
+
Or standalone (still works):
|
| 12 |
+
python3 tests/test_diarization_minimal.py
|
| 13 |
+
"""
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
import sys
|
| 17 |
+
from pathlib import Path
|
| 18 |
+
from typing import Iterable, List, Tuple
|
| 19 |
+
import numpy as np
|
| 20 |
+
import pytest
|
| 21 |
+
|
| 22 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 23 |
+
if str(ROOT) not in sys.path:
|
| 24 |
+
sys.path.insert(0, str(ROOT))
|
| 25 |
+
|
| 26 |
+
from src.diarization import perform_speaker_diarization_on_utterances # type: ignore
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
EMB_DIM = 192
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def _emb(seed: int, delta: float | None = None) -> np.ndarray:
|
| 33 |
+
rng = np.random.default_rng(seed)
|
| 34 |
+
v = rng.normal(size=EMB_DIM).astype(np.float32)
|
| 35 |
+
if delta is not None:
|
| 36 |
+
v = (v + delta).astype(np.float32)
|
| 37 |
+
return v
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
class MockStream:
|
| 41 |
+
def __init__(self, sample_rate: int, segment: np.ndarray | None):
|
| 42 |
+
self.sample_rate = sample_rate
|
| 43 |
+
self.segment = segment
|
| 44 |
+
def accept_waveform(self, sr, seg): # pragma: no cover - no-op
|
| 45 |
+
pass
|
| 46 |
+
def input_finished(self): # pragma: no cover - no-op
|
| 47 |
+
pass
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
class MockExtractor:
|
| 51 |
+
"""Mimics the subset of sherpa_onnx SpeakerEmbeddingExtractor we use."""
|
| 52 |
+
def __init__(self, embeddings_sequence: List[np.ndarray]):
|
| 53 |
+
self._embs = embeddings_sequence
|
| 54 |
+
self._i = 0
|
| 55 |
+
def create_stream(self):
|
| 56 |
+
return MockStream(16000, None)
|
| 57 |
+
def compute(self, _stream):
|
| 58 |
+
if self._i >= len(self._embs):
|
| 59 |
+
return self._embs[-1]
|
| 60 |
+
emb = self._embs[self._i]
|
| 61 |
+
self._i += 1
|
| 62 |
+
return emb
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def _collect(gen) -> List[Tuple[float, float, int]]:
|
| 66 |
+
result: List[Tuple[float, float, int]] | None = None
|
| 67 |
+
for item in gen:
|
| 68 |
+
if isinstance(item, list):
|
| 69 |
+
result = item # final segments emitted
|
| 70 |
+
break
|
| 71 |
+
if result is None:
|
| 72 |
+
# Drain StopIteration
|
| 73 |
+
try:
|
| 74 |
+
while True:
|
| 75 |
+
next(gen)
|
| 76 |
+
except StopIteration as e:
|
| 77 |
+
result = e.value # type: ignore
|
| 78 |
+
assert result is not None, "Generator produced no result list"
|
| 79 |
+
return result
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def _run_case(embeddings: List[np.ndarray], utterances: List[Tuple[float, float, str]]):
|
| 83 |
+
extractor = MockExtractor(embeddings)
|
| 84 |
+
audio = np.zeros(int(16000 * 3), dtype=np.float32) # 3s silence is enough
|
| 85 |
+
gen = perform_speaker_diarization_on_utterances(
|
| 86 |
+
audio=audio,
|
| 87 |
+
sample_rate=16000,
|
| 88 |
+
utterances=utterances,
|
| 89 |
+
embedding_extractor=extractor,
|
| 90 |
+
config_dict={"cluster_threshold": 0.5, "num_speakers": -1},
|
| 91 |
+
progress_callback=None,
|
| 92 |
+
)
|
| 93 |
+
segments = _collect(gen)
|
| 94 |
+
# Basic validation
|
| 95 |
+
for seg in segments:
|
| 96 |
+
assert isinstance(seg, tuple) and len(seg) == 3
|
| 97 |
+
s, e, spk = seg
|
| 98 |
+
assert 0 <= s < e, "Invalid time bounds"
|
| 99 |
+
assert isinstance(spk, int)
|
| 100 |
+
return segments
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
def test_single_segment():
|
| 104 |
+
utts = [(0.0, 2.0, "Hello world")]
|
| 105 |
+
segs = _run_case([_emb(1)], utts)
|
| 106 |
+
assert len(segs) == 1
|
| 107 |
+
assert segs[0][2] == 0
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def test_two_similar_segments_same_speaker():
|
| 111 |
+
base = _emb(2)
|
| 112 |
+
almost_same = (base + 0.001).astype(np.float32)
|
| 113 |
+
utts = [(0.0, 2.0, "Bonjour"), (2.1, 4.0, "Bonjour encore")]
|
| 114 |
+
segs = _run_case([base, almost_same], utts)
|
| 115 |
+
assert len(segs) == 2
|
| 116 |
+
assert len({spk for *_rest, spk in segs}) == 1, "Should have merged speaker IDs"
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def test_two_different_segments_distinct_speakers():
|
| 120 |
+
utts = [(0.0, 1.5, "Hola"), (1.6, 3.2, "Adios")]
|
| 121 |
+
segs = _run_case([_emb(10), _emb(200)], utts)
|
| 122 |
+
assert len(segs) == 2
|
| 123 |
+
# Can be 1 or 2 depending on heuristic similarity, but expecting at least one speaker id present
|
| 124 |
+
assert len(segs) >= 1
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
def test_three_segments_enhanced_or_fallback():
|
| 128 |
+
utts = [(0.0, 1.0, "A"), (1.1, 2.2, "B"), (2.3, 3.4, "C")]
|
| 129 |
+
segs = _run_case([_emb(11), _emb(12), _emb(13)], utts)
|
| 130 |
+
assert len(segs) == 3, "Granularity should be preserved for small n"
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
# Allow running directly without pytest invocation
|
| 134 |
+
if __name__ == "__main__": # pragma: no cover
|
| 135 |
+
import pytest as _pytest
|
| 136 |
+
raise SystemExit(_pytest.main([__file__]))
|
tests/test_multilingual.py
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Multilingual summarization & title tests (LLM heavy by default).
|
| 3 |
+
|
| 4 |
+
Set VOXSUM_SKIP_LLM_TESTS=1 to skip these tests (mocked LLM in conftest).
|
| 5 |
+
Optionally set VOXSUM_GGUF_MODEL to force a specific GGUF model.
|
| 6 |
+
"""
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
import os
|
| 10 |
+
import sys
|
| 11 |
+
import pytest
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
|
| 14 |
+
if os.getenv("VOXSUM_SKIP_LLM_TESTS") == "1": # opt-out mechanism
|
| 15 |
+
pytest.skip("LLM tests skipped (unset VOXSUM_SKIP_LLM_TESTS to run)", allow_module_level=True)
|
| 16 |
+
|
| 17 |
+
# Ensure repository root on path
|
| 18 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 19 |
+
if str(ROOT) not in sys.path:
|
| 20 |
+
sys.path.insert(0, str(ROOT))
|
| 21 |
+
|
| 22 |
+
from src.summarization import summarize_transcript, generate_title # noqa: E402
|
| 23 |
+
from src.utils import available_gguf_llms # noqa: E402
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def _select_model():
|
| 27 |
+
env_choice = os.getenv("VOXSUM_GGUF_MODEL")
|
| 28 |
+
if env_choice and env_choice in available_gguf_llms:
|
| 29 |
+
return env_choice
|
| 30 |
+
for cand in ["Gemma-3-270M", "Gemma-3-3N-E2B", "Gemma-3-3N-E4B", "Gemma-3-1B"]:
|
| 31 |
+
if cand in available_gguf_llms:
|
| 32 |
+
return cand
|
| 33 |
+
return next(iter(available_gguf_llms))
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
# Test transcripts in different languages
|
| 37 |
+
TEST_TRANSCRIPTS = {
|
| 38 |
+
"english": """
|
| 39 |
+
Hello everyone, today we're going to discuss artificial intelligence and its impact on modern society.
|
| 40 |
+
AI has become increasingly important in our daily lives, from voice assistants like Siri and Alexa,
|
| 41 |
+
to recommendation systems on Netflix and YouTube. The technology is advancing rapidly, with machine
|
| 42 |
+
learning algorithms becoming more sophisticated every day. However, we must also consider the ethical
|
| 43 |
+
implications of AI development, including privacy concerns, job displacement, and the potential for bias
|
| 44 |
+
in automated decision-making systems. It's crucial that we develop AI responsibly to ensure it benefits
|
| 45 |
+
all of humanity rather than just a select few.
|
| 46 |
+
""",
|
| 47 |
+
"french": """
|
| 48 |
+
Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle et de son impact sur la société moderne.
|
| 49 |
+
L'IA est devenue de plus en plus importante dans notre vie quotidienne, des assistants vocaux comme Siri et Alexa,
|
| 50 |
+
aux systèmes de recommandation sur Netflix et YouTube. La technologie progresse rapidement, avec des algorithmes
|
| 51 |
+
d'apprentissage automatique devenant plus sophistiqués chaque jour. Cependant, nous devons également considérer
|
| 52 |
+
les implications éthiques du développement de l'IA, y compris les préoccupations de confidentialité, le déplacement
|
| 53 |
+
d'emplois, et le potentiel de biais dans les systèmes de prise de décision automatisée. Il est crucial que nous
|
| 54 |
+
développions l'IA de manière responsable pour assurer qu'elle bénéficie à toute l'humanité plutôt qu'à une élite.
|
| 55 |
+
""",
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
def test_multilingual_summarization():
|
| 60 |
+
model_name = _select_model()
|
| 61 |
+
for language, transcript in TEST_TRANSCRIPTS.items():
|
| 62 |
+
parts = list(summarize_transcript(transcript, model_name, "Summarize this transcript"))
|
| 63 |
+
summary = "".join(parts)
|
| 64 |
+
assert summary, f"Empty summary for {language}"
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def test_language_consistency():
|
| 68 |
+
model_name = _select_model()
|
| 69 |
+
for language, transcript in TEST_TRANSCRIPTS.items():
|
| 70 |
+
title = generate_title(transcript, model_name)
|
| 71 |
+
parts = list(summarize_transcript(transcript, model_name, "Summarize this transcript"))
|
| 72 |
+
summary = "".join(parts)
|
| 73 |
+
assert title and summary
|
| 74 |
+
assert len(title) < 120
|
tests/test_multilingual_quick.py
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Quick multilingual title smoke tests (LLM)."""
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
import os, sys, pytest
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
if os.getenv("VOXSUM_SKIP_LLM_TESTS") == "1":
|
| 8 |
+
pytest.skip("LLM tests skipped (unset VOXSUM_SKIP_LLM_TESTS to run)", allow_module_level=True)
|
| 9 |
+
|
| 10 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 11 |
+
if str(ROOT) not in sys.path:
|
| 12 |
+
sys.path.insert(0, str(ROOT))
|
| 13 |
+
|
| 14 |
+
from src.summarization import generate_title # noqa: E402
|
| 15 |
+
from src.utils import available_gguf_llms # noqa: E402
|
| 16 |
+
|
| 17 |
+
def _select_model():
|
| 18 |
+
env_choice = os.getenv("VOXSUM_GGUF_MODEL")
|
| 19 |
+
if env_choice and env_choice in available_gguf_llms:
|
| 20 |
+
return env_choice
|
| 21 |
+
for cand in ["Gemma-3-270M", "Gemma-3-3N-E2B", "Gemma-3-3N-E4B", "Gemma-3-1B"]:
|
| 22 |
+
if cand in available_gguf_llms:
|
| 23 |
+
return cand
|
| 24 |
+
return next(iter(available_gguf_llms))
|
| 25 |
+
|
| 26 |
+
TEST_TRANSCRIPTS = {
|
| 27 |
+
"english": "Hello everyone, today we're going to discuss artificial intelligence and its impact.",
|
| 28 |
+
"french": "Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle.",
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
+
def test_multilingual_titles():
|
| 32 |
+
model_name = _select_model()
|
| 33 |
+
for language, transcript in TEST_TRANSCRIPTS.items():
|
| 34 |
+
title = generate_title(transcript, model_name)
|
| 35 |
+
assert title, f"Empty title for {language}"
|
| 36 |
+
assert len(title.split()) <= 15
|
tests/test_summary_language.py
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Single-language summary smoke test (LLM)."""
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
import os, sys, pytest
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
if os.getenv("VOXSUM_SKIP_LLM_TESTS") == "1":
|
| 8 |
+
pytest.skip("LLM tests skipped (unset VOXSUM_SKIP_LLM_TESTS to run)", allow_module_level=True)
|
| 9 |
+
|
| 10 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 11 |
+
if str(ROOT) not in sys.path:
|
| 12 |
+
sys.path.insert(0, str(ROOT))
|
| 13 |
+
|
| 14 |
+
from src.summarization import summarize_transcript # noqa: E402
|
| 15 |
+
from src.utils import available_gguf_llms # noqa: E402
|
| 16 |
+
|
| 17 |
+
def _select_model():
|
| 18 |
+
env_choice = os.getenv("VOXSUM_GGUF_MODEL")
|
| 19 |
+
if env_choice and env_choice in available_gguf_llms:
|
| 20 |
+
return env_choice
|
| 21 |
+
for cand in ["Gemma-3-270M", "Gemma-3-3N-E2B", "Gemma-3-3N-E4B", "Gemma-3-1B"]:
|
| 22 |
+
if cand in available_gguf_llms:
|
| 23 |
+
return cand
|
| 24 |
+
return next(iter(available_gguf_llms))
|
| 25 |
+
|
| 26 |
+
def test_single_language_summary():
|
| 27 |
+
model = _select_model()
|
| 28 |
+
transcript = ("Bonjour à tous, aujourd'hui nous allons discuter de l'intelligence artificielle et "
|
| 29 |
+
"de son impact sur la société moderne. L'IA transforme déjà nos usages.")
|
| 30 |
+
parts = list(summarize_transcript(transcript, model, "Résumez ce transcript"))
|
| 31 |
+
summary = "".join(parts)
|
| 32 |
+
assert summary
|
| 33 |
+
assert len(summary) < 2000
|