Manoghn/tinyllama-lesson-synthesizer

📚 Model Description

This repository hosts Manoghn/tinyllama-lesson-synthesizer, a fine-tuned TinyLlama/TinyLlama-1.1B-Chat-v1.0 model designed to generate comprehensive and engaging educational lessons. It's a key component of the larger SynthAI project, which aims to create multi-modal learning content including lessons, images, quizzes, and audio narration.

The model has been specifically adapted using LoRA (Low-Rank Adaptation) to excel at generating structured, informative text suitable for educational purposes across various domains.

🎯 Objective

The primary objective of this fine-tuned model is to automatically generate detailed educational lessons on diverse topics. By providing a topic, the model produces well-structured, Markdown-formatted content, serving as a foundation for broader educational material synthesis.

📊 Training Data

The model was fine-tuned on a custom-curated dataset of 60 educational lessons.

Data Collection: Lessons were generated using the Llama-3.1-8B-Instruct model via the Hugging Face Inference Client. Each lesson was crafted in response to a detailed prompt instructing the model to act as an "expert educational content creator."
Content Structure: The generated lessons adhered to a specific Markdown format, including:
- A descriptive level-1 heading.
- An introduction explaining the topic's importance.
- 3-5 key concepts with clear explanations.
- Real-world applications or examples.
- Practical examples, formulas, or code snippets (if relevant).
- A concise summary.
Domains Covered: The dataset spans four educational domains:
- Science (e.g., Photosynthesis, Newton's Laws of Motion)
- Mathematics (e.g., Pythagorean Theorem, Quadratic Equations)
- Computer Science (e.g., Binary Number System, Data Structures Overview)
- Humanities (e.g., Renaissance Art Period, World War II Causes)
Dataset Size: The final dataset comprised 60 high-quality lesson examples, split into training (70%), validation (15%), and test (15%) sets.

⚙️ Fine-tuning Methodology

The Manoghn/tinyllama-lesson-synthesizer model was fine-tuned from TinyLlama/TinyLlama-1.1B-Chat-v1.0 using Parameter-Efficient Fine-tuning (PEFT) with LoRA.

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Quantization: The base model was loaded with 8-bit quantization using BitsAndBytesConfig to reduce memory footprint and enable training on resource-constrained environments (Colab free tier T4 GPU).
LoRA Configuration:
- r=8: LoRA rank
- lora_alpha=32: Scaling factor
- target_modules=["q_proj", "v_proj"]: LoRA adapters applied to query and value projection layers.
- lora_dropout=0.05
- bias="none"
- task_type=TaskType.CAUSAL_LM
Training Parameters (transformers.TrainingArguments):
- output_dir: /content/drive/MyDrive/genai_synthesizer/results
- per_device_train_batch_size=1
- per_device_eval_batch_size=1
- learning_rate=2e-4
- num_train_epochs=1
- logging_steps=10
- fp16=True
- report_to="none"
Training Environment: The fine-tuning was performed on a Google Colab free tier T4 GPU.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Manoghn/tinyllama-lesson-synthesizer

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1312)

this model