Spaces:

Gamahea
/

lemm-test-100

Running on Zero

lemm-test-100 / HF_COLLECTION_INTEGRATION.md

Gamahea

Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist

5912922 3 months ago

preview code

raw

history blame contribute delete

6.53 kB

	# HuggingFace Collection Integration - Complete

	## 🎯 Overview

	Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.

	## ✅ Implemented Features

	### 1. Dataset Import (`import_prepared_dataset`)
	- Location: `backend/services/dataset_service.py`
	- Purpose: Import prepared datasets from ZIP files
	- Features:
	- Supports both root-level and subfolder `dataset_info.json` structures
	- Automatic name conflict resolution with numeric suffixes (`_1`, `_2`, etc.)
	- Validates dataset structure before import
	- Updates metadata with new dataset key if renamed

	```python
	# Example usage in app.py
	def import_dataset(zip_file):
	dataset_service = DatasetService()
	dataset_key = dataset_service.import_prepared_dataset(zip_file)
	return f"✅ Imported dataset: {dataset_key}"
	```

	### 2. LoRA Collection Sync (`sync_on_startup`)
	- Location: `backend/services/hf_storage_service.py`
	- Purpose: Automatically download missing LoRAs from HF collection on app startup
	- Features:
	- Lists all LoRAs in collection
	- Compares with local LoRA directory
	- Downloads only missing LoRAs
	- Handles name conflicts with numeric suffixes
	- Logs sync activity

	```python
	# Called automatically on app startup (app.py line 82)
	hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
	sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))
	```

	### 3. Enhanced LoRA Upload
	- Location: `app.py` - `start_lora_training()` function
	- Purpose: Upload trained LoRAs to HF collection with full metadata
	- Features:
	- Uploads LoRA to individual model repo
	- Adds to collection automatically
	- Includes training config in metadata
	- Returns repo URL and collection link
	- Graceful error handling (saves locally if upload fails)

	```python
	# Upload after training (app.py lines 1397-1411)
	upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
	if upload_result and 'repo_id' in upload_result:
	# Success - show URLs
	progress += f"\n✅ LoRA uploaded successfully!"
	progress += f"\n🔗 Model: {upload_result['repo_id']}"
	progress += f"\n📚 Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"
	```

	## 📦 Name Conflict Resolution

	All import functions implement automatic name conflict resolution:

	1. First Check: Try original name
	2. If Exists: Append `_1`, `_2`, `_3`, etc.
	3. Update Metadata: Store new name in `dataset_info.json` or `metadata.json`
	4. Log Action: Inform user of renaming

	### Example Flow

	```
	Original: my_dataset
	Already exists → my_dataset_1
	Already exists → my_dataset_2
	Available → Use my_dataset_2 ✅
	```

	## 🔄 Automatic Workflows

	### On App Startup
	1. Check HF collection for LoRAs
	2. Compare with local `models/loras/` directory
	3. Download any missing LoRAs
	4. Log sync results

	### After LoRA Training
	1. Train LoRA adapter locally
	2. Upload to HF as individual model repo
	3. Add to collection
	4. Return URLs for viewing

	### Dataset Import
	1. User uploads ZIP file
	2. Extract and validate structure
	3. Check for name conflicts
	4. Copy to `training_data/` directory
	5. Update dropdown lists

	## 🛠️ Technical Details

	### File Structure Support

	LoRA ZIP Files (both supported):
	```
	Option 1 (root):
	my_lora.zip/
	├── metadata.json
	├── adapter_config.json
	└── adapter_model.safetensors

	Option 2 (subfolder):
	my_lora.zip/
	└── my_lora/
	├── metadata.json
	├── adapter_config.json
	└── adapter_model.safetensors
	```

	Dataset ZIP Files (both supported):
	```
	Option 1 (root):
	my_dataset.zip/
	├── dataset_info.json
	├── audio/
	│ ├── sample_000001.wav
	│ └── sample_000002.wav
	└── splits.json

	Option 2 (subfolder):
	my_dataset.zip/
	└── my_dataset/
	├── dataset_info.json
	├── audio/
	└── splits.json
	```

	### Error Handling

	All import/sync functions include:
	- Try-catch blocks for graceful error handling
	- Comprehensive logging with context
	- User-friendly error messages
	- Fallback behavior (e.g., save locally if upload fails)

	## 📊 HuggingFace Collection Structure

	Collection: `Gamahea/lemm-100-pre-beta`
	- Purpose: Organize all LEMM LoRA adapters
	- Visibility: Public
	- Items: Individual model repos

	Model Repos: `Gamahea/lemm-lora-{name}`
	- Type: LoRA adapters (safetensors)
	- Metadata: Training config, dataset info, creation date
	- Files: adapter_model.safetensors, adapter_config.json, metadata.json

	## 🎯 User Workflows

	### Train & Share a LoRA
	1. Prepare dataset (curated or user audio)
	2. Configure training parameters
	3. Click "Start Training"
	4. Wait for completion
	5. LoRA automatically uploaded to HF collection
	6. Share collection link with others

	### Use Someone's LoRA
	1. Open LEMM Space
	2. App automatically syncs LoRAs from collection
	3. Select LoRA in generation dropdown
	4. Generate music with custom style

	### Import a Dataset
	1. Export dataset from another LEMM instance
	2. Click "Import Dataset" in training tab
	3. Upload ZIP file
	4. Dataset appears in training dropdown
	5. Use for LoRA training

	## 🔗 Related Files

	- HF Storage Service: [backend/services/hf_storage_service.py](backend/services/hf_storage_service.py)
	- Dataset Service: [backend/services/dataset_service.py](backend/services/dataset_service.py)
	- Main App: [app.py](app.py)
	- LoRA Training Service: [backend/services/lora_training_service.py](backend/services/lora_training_service.py)

	## 📝 Commit History

	- 17f5813 (latest): Add dataset import & LoRA collection sync
	- `import_prepared_dataset()` method
	- `sync_on_startup()` method
	- Enhanced `upload_lora()` with training_config
	- Numeric suffix naming for conflicts

	- f65e448: Fixed LoRA import to support both ZIP structures
	- 2f0c8b4: Added "Load for Training" workflow
	- b40ee5f: Fixed DataFrame handling in dataset preparation

	## 🎉 Result

	Complete HuggingFace ecosystem integration!
	- ✅ Auto-sync LoRAs from collection
	- ✅ Upload trained LoRAs to collection
	- ✅ Import/export datasets
	- ✅ Name conflict resolution
	- ✅ Comprehensive error handling
	- ✅ User-friendly feedback

	All three issues from screenshots are now resolved! 🚀