YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Description

This model is a fine-tuned version of all-MiniLM-L6-v2 adapted to mental health discussion data from Reddit forums using the TSDAE (Transformer-based Sequential Denoising Auto-Encoder) approach.

The base model, all-MiniLM-L6-v2, is a sentence transformer that maps sentences and paragraphs to a 384-dimensional dense vector space. This adapted version has been specifically trained on domain-specific text from mental health subreddits to better capture the semantic nuances and terminology common in mental health discussions.

Training Approach

TSDAE is an unsupervised training method that encodes damaged sentences (with deleted or swapped tokens) into fixed-sized vectors and requires a decoder to reconstruct the original sentences from these embeddings. This process ensures that semantic information is well-captured in the sentence embeddings produced by the encoder. TSDAE has been shown to be particularly effective for domain adaptation, significantly outperforming traditional approaches like Masked Language Model (MLM) for sentence embedding tasks.

Intended Use

This model is designed for sentence embedding tasks in the mental health domain, including:

  • Semantic similarity between mental health-related texts
  • Clustering of mental health discussion posts
  • Information retrieval from mental health forums
  • Classification and analysis of mental health content

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
25
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support