๐ŸŽฏ Namo Turn Detector v1 - Japanese

License ONNX Model Size Inference Speed

๐Ÿš€ Namo Turn Detection Model for Japanese


๐Ÿ“‹ Overview

The Namo Turn Detector is a specialized AI model designed to solve one of the most challenging problems in conversational AI: knowing when a user has finished speaking.

This Japanese-specialist model uses advanced natural language understanding to distinguish between:

  • โœ… Complete utterances (user is done speaking)
  • ๐Ÿ”„ Incomplete utterances (user will continue speaking)

Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency.

๐Ÿ”‘ Key Features

  • Turn Detection Specialist: Detects end-of-turn vs. continuation in Japanese speech transcripts.
  • Low Latency: Optimized with quantized ONNX for <14ms inference.
  • Robust Performance: 93.5% accuracy on diverse Japanese utterances.
  • Easy Integration: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.
  • Enterprise Ready: Supports real-time conversational AI and voice assistants.

๐Ÿ“Š Performance Metrics

Metric Score
๐ŸŽฏ Accuracy 93.52%
๐Ÿ“ˆ F1-Score 93.87%
๐ŸŽช Precision 89.61%
๐ŸŽญ Recall 98.57%
โšก Latency <14ms
๐Ÿ’พ Model Size ~135MB
Alt text

๐Ÿ“Š Evaluated on 800+ Japanese utterances from diverse conversational contexts

โšก๏ธ Speed Analysis

Alt text

๐Ÿ”ง Train & Test Scripts

Train Script Test Script

๐Ÿ› ๏ธ Installation

To use this model, you will need to install the following libraries.

pip install onnxruntime transformers huggingface_hub

๐Ÿš€ Quick Start

You can run inference directly from Hugging Face repository.

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

class TurnDetector:
    def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Japanese"):
        """
        Initializes the detector by downloading the model and tokenizer
        from the Hugging Face Hub.
        """
        print(f"Loading model from repo: {repo_id}")
        
        # Download the model and tokenizer from the Hub
        # Authentication is handled automatically if you are logged in
        model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx")
        self.tokenizer = AutoTokenizer.from_pretrained(repo_id)
        
        # Set up the ONNX Runtime inference session
        self.session = ort.InferenceSession(model_path)
        self.max_length = 512
        print("โœ… Model and tokenizer loaded successfully.")

    def predict(self, text: str) -> tuple:
        """
        Predicts if a given text utterance is the end of a turn.
        Returns (predicted_label, confidence) where:
        - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn"
        - confidence: confidence score between 0 and 1
        """
        # Tokenize the input text
        inputs = self.tokenizer(
            text,
            truncation=True,
            max_length=self.max_length,
            return_tensors="np"
        )
        
        # Prepare the feed dictionary for the ONNX model
        feed_dict = {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        }
        
        # Run inference
        outputs = self.session.run(None, feed_dict)
        logits = outputs[0]

        probabilities = self._softmax(logits[0])
        predicted_label = np.argmax(probabilities)
        confidence = float(np.max(probabilities))

        return predicted_label, confidence

    def _softmax(self, x, axis=None):
        if axis is None:
            axis = -1
        exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
        return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

# --- Example Usage ---
if __name__ == "__main__":
    detector = TurnDetector()
    
    sentences = [
        "1382ๅนดใซ่–ใƒ‘ใ‚ฆใƒญไฟฎ้“ไผšใฎใŸใ‚ใซๅปบใฆใ‚‰ใ‚ŒใŸๅƒง้™ขใงใ™ใ€‚",      # Expected: End of Turn
        "1913ๅนดใƒžใƒ‹ใƒฉใง็ฌฌ1ๅ›žๆฑๆด‹ใ‚ชใƒชใƒณใƒ”ใƒƒใ‚ฏใŒ้–‹ไผšใ ใ‹ใ‚‰", # Expected: Not End of Turn

    ]
    
    for sentence in sentences:
        predicted_label, confidence = detector.predict(sentence)
        result = "End of Turn" if predicted_label == 1 else "Not End of Turn"
        print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})")
        print("-" * 50)

๐Ÿค– VideoSDK Agents Integration

Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications.

from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model

#download model
pre_download_namo_turn_v1_model(language="ja")

# Initialize Japanese turn detector for VideoSDK Agents
turn_detector = NamoTurnDetectorV1(language="ja")

๐Ÿ“š Complete Integration Guide - Learn how to use NamoTurnDetectorV1 with VideoSDK Agents

๐Ÿ“– Citation

@model{namo_turn_detector_ja_2025,
  title={Namo Turn Detector v1: Japanese},
  author={VideoSDK Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Japanese},
  note={ONNX-optimized DistilBERT for turn detection in Japanese}
}

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Made with โค๏ธ by the VideoSDK Team

VideoSDK

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for videosdk-live/Namo-Turn-Detector-v1-Japanese

Quantized
(30)
this model

Evaluation results

  • Accuracy on Namo Turn Detector v1 Test - Japanese
    self-reported
    0.935
  • F1 Score on Namo Turn Detector v1 Test - Japanese
    self-reported
    0.939
  • Precision on Namo Turn Detector v1 Test - Japanese
    self-reported
    0.896
  • Recall on Namo Turn Detector v1 Test - Japanese
    self-reported
    0.986