# Sylber

This is official implementation of [Sylber: Syllabic Embedding Representation of Speech from Raw Audio](https://arxiv.org/abs/2410.07168). 

Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.

The model is developed and trained by Berkeley Speech Group.


## Installation

The model can be installed through pypi for inference. 

```
pip install sylber
```

### Usage

```python

from sylber import Segmenter

# Loading Sylber
segmenter = Segmenter(model_ckpt="sylber")


# Run Sylber
wav_file = "samples/sample.wav"

outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.

# outputs = {"segments": numpy array of [start, end] of segment,
#            "segment_features": numpy array of segment-averaged features,
#            "hidden_states": numpy array of raw features used for segmentation.
```


### Training

Please check [https://github.com/Berkeley-Speech-Group/sylber](https://github.com/Berkeley-Speech-Group/sylber) for training the model. 

---
license: apache-2.0
---