File size: 2,171 Bytes
31dc253
 
aa270af
 
31dc253
 
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
31dc253
58d504b
 
 
 
31dc253
58d504b
31dc253
58d504b
 
 
31dc253
58d504b
 
31dc253
585ce20
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
---

# Korla/Wav2Vec2BertForCTC-hsb

## Model Description

**Wav2Vec2BertForCTC-hsb** is a fine-tuned [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) model with a BERT-style character classification head, adapted for **Upper Sorbian** automatic speech recognition (ASR). This model has been fine-tuned for CTC (Connectionist Temporal Classification) loss and is capable of transcribing audio in the Upper Sorbian language.

## Usage

This model can be used for speech-to-text tasks on Upper Sorbian audio.

An optional **5-gram language model (`5gram.bin`)** is provided for decoding with an external LM scorer. This n-gram model was trained on a corpus of **Upper Sorbian Holy Masses**, which can help improve decoding accuracy for religious or formal speech domains.

## Training Data

The model was fine-tuned on a dataset provided by the **Foundation for the Sorbian People**, which consists of high-quality recordings and transcripts in Upper Sorbian. The dataset includes diverse speakers and speech conditions, ensuring a robust acoustic model.

## Language Model

- **Name:** `5gram.bin`
- **Type:** 5-gram character-level KenLM language model
- **Domain:** Upper Sorbian religious speech (Holy Masses)
- **Usage:** For decoding with tools such as [CTCDecoder](https://github.com/parlance/ctcdecode).

## Limitations

- The model's accuracy may degrade on informal or highly dialectal speech not represented in the training data.  
- The language model is domain-specific (religious speech) and may bias decoding toward that context.  
- The model supports only **Upper Sorbian**, not Lower Sorbian or other Slavic languages.

## How to Use
For normal use (without LM) you can load the model into a pipeline.

To use the 5-gram language model for decoding, use the pyctcdecode library.

## Citation
Please cite as:
```bibtex
@misc{korla_wav2vec2bertforctc_hsb,
  author       = {Karl Baier},
  title        = {Wav2Vec2BertForCTC-hsb},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Korla/Wav2Vec2BertForCTC-hsb}},
}
```