Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ pipeline_tag: audio-classification
|
|
| 23 |
- **Architecture**: 12-layer Transformer encoder (768-dim) with a 7-layer 1D CNN frontend
|
| 24 |
- **Input**: Raw mono audio at 24kHz
|
| 25 |
- **Training Context Length**: 5 seconds
|
| 26 |
-
- **Pretraining Objective**: MLM-style multi-task masked prediction of discrete [EnCodec](https://huggingface.co/facebook/encodec_24khz) acoustic tokens and continuous constant-Q transform (CQT) spectrogram reconstruction at a
|
| 27 |
|
| 28 |
---
|
| 29 |
|
|
@@ -51,6 +51,7 @@ We evaluate **CultureMERT-95M** via probing on both Western and non-Western auto
|
|
| 51 |
- **mAP** (Mean Average Precision)
|
| 52 |
- **Micro-F1** and **Macro-F1**
|
| 53 |
|
|
|
|
| 54 |
Evaluation follows the [MARBLE](https://github.com/a43992899/MARBLE) protocol under constrained settings. We use standardized train/test splits from [ccml](https://github.com/pxaris/ccml) for continual pre-training and probing-based evaluation.
|
| 55 |
|
| 56 |
|
|
|
|
| 23 |
- **Architecture**: 12-layer Transformer encoder (768-dim) with a 7-layer 1D CNN frontend
|
| 24 |
- **Input**: Raw mono audio at 24kHz
|
| 25 |
- **Training Context Length**: 5 seconds
|
| 26 |
+
- **Pretraining Objective**: MLM-style multi-task masked prediction of discrete [EnCodec](https://huggingface.co/facebook/encodec_24khz) acoustic tokens and continuous constant-Q transform (CQT) spectrogram reconstruction at a 75 Hz feature rate
|
| 27 |
|
| 28 |
---
|
| 29 |
|
|
|
|
| 51 |
- **mAP** (Mean Average Precision)
|
| 52 |
- **Micro-F1** and **Macro-F1**
|
| 53 |
|
| 54 |
+
|
| 55 |
Evaluation follows the [MARBLE](https://github.com/a43992899/MARBLE) protocol under constrained settings. We use standardized train/test splits from [ccml](https://github.com/pxaris/ccml) for continual pre-training and probing-based evaluation.
|
| 56 |
|
| 57 |
|